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Preface 


Since human beings stepped into the Internet era, our lives are deeply involved 
with the Internet. Many killer applications are carried out through Internet-based 
applications. At the same time, motivated by huge financial, political, or other 
rewards, hackers are exhausting their energy to execute cybercrimes. Due to the 
nature of the Internet and the lack of cyber laws, cyberspace has been a heaven for 
intelligent attackers. It is easy to launch attacks, but hard to identify the persons who 
commit the attacks. It is even harder to bring them to justice. 

To date, one critical attack in cyberspace is the distributed denial-of-service 
(DDoS) attack. My study on cybersecurity started in 2007. I am attracted by this 
topic not only because of the problem itself but also the bigger research field of 
cyberspace. It is generally agreed that the current Internet research lacks theoretical 
foundation. Research indicates that our understanding of the cyberspace is limited 
and there are a huge unexplored territory in cyberspace for both academical and 
industrial participants. 

This book brief is mainly based on our research of the DDoS problem. For 
readers’ convenience, we try to make each chapter relatively independent. More- 
over, we pay a special attention on methodology and mathematical modelling and 
expect to benefit readers for their potential research in related fields. Constrained by 
my knowledge and capability, the content of this book brief is very shallow in terms 
of mathematical modelling. However, I decide not to hide my disadvantage in order 
to save a bit time for some beginners who may work on the related fields. 

I would like to thank the editor of this series, Professor Sherman Shen, for his 
constructive guidance and kind help. Thanks are also noted for the team of Springer 
for their patience and assistance. 

I also would like to take this opportunity to thank Professor Yue Wu, Professor Yi 
Zhang, and Professor Wanlei Zhou for bringing me to the academic world. During 
these years study on DDoS, my colleagues, co-authors, and many people offered me 
their guidance, support, and help, such as Professor Ivan Stojmenovic, Professor Kai 
Hwang, and Mr Bin Liu. The list is too long to complete here, but I do appreciate 
their time and effort from the bottom of my heart. I especially thank Professor Weijia 
Jia from City University of Hong Kong, Professor Weifa Liang from Australia 
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National University, and Professor Song Guo from University of Aizu and their 
institutions for the financial support of my visits. I am grateful to Professor Yong 
Xiang and Dr Simon James for their continuous discussion and help on research and 
paper writing. 

In particular, I would like to thank my wife, Su, for her understanding and full 
support for my research. 


Melbourne, Australia Dr Shui Yu 


Contents 


1 An Overview of DDoS Attacks ............ 00... cece ccc ccc cece eee eeeeeeeees 1 
LL Troienn 2.024. o25seac date ddeadxeidae wiaahdausendiyadeereenaaeneberts 1 

1.2 Howto Launch DDoS Attacks .......... 0... cece cc cece ccc ence een ee eeeees 3 

1.3 Challenges in DDoS Related Research .................c0eeee scene eens 5 
1.3.1 Malicious Networks ............. ccc ccc cece ce eee e eee eee eneeeennes 6 

1.3.2 Data Collection of Malicious Networks ................00eeeees 8 

1.3.3 Topology Modelling of Malicious Networks ................... 9 

1.3.4 Dynamics of Malicious Networks ...................... eee eee 10 

1.3.5 Concealed Malicious Activity Detection ....................... 11 

1.3.6 Forensics of Malicious Networks ................ccccceee eee eees 11 
EATE LAE e r E E E EEEE E 12 

2 Malicious Networks for DDoS Attacks ...................c ccc ee cece cece eens 15 
PA O e 25 co-cuiensh des aes ade ve deoades Roi seu EEE 15 
2.2 The Fast Flux Mechanism and Detection ...................ccecceee eens 16 
2.2.1 The Fast Flux Mechanism..................cccccceeecceneceeeees 16 

222 Past Plax DSH Ol os: sicc sas dco. bcgs dase conde cosa wee eidewd cee 17 

2.3 The Domain Flux Mechanism and Detection.....................00000e 19 
23:1 The Domain Flux Mechanism... ..5 0.0555. 25.5660 6565 can cose bees 19 

2.3.2 Domain Flux Detection ....5......00.0s0cceeseccessvccssvceesescs 20 

2.4 Modelling Malicious Networks .......... 0.0.0... cece cece ee cire eee e eee 23 
2A Ve SE Model erap TETEE NENE TOREA EE EA 25 

Say ME SVS Modal cens E E EREN 25 

243 The SIR Model oss 5scsastecawasiuty ab siaN ATEREA 26 
A i505 5545 E E E E E 27 

3 DDoS Attack Detection ..................ccccc ccc ccc cece cece cece eee eeeeeeeeees 31 
2A MMipdyetom 25 cise )5 cols Baa E T ds beheee de bodes seeds 31 
3.2 Feature Based Detection Methods... 0.00050... c.c.00s6ccesecccesvccess ens 32 
3.2.1 Profile Based Detection... . 0.55.4. :.s.des0 edd sc boss eee eee cease 32 

3.2.2 Low Rate DDoS Attack Detection ................ 0... cece eee es 35 

3.3 (Network Traffic Based Detection 2.4.5: 5. 55.350 osc cece esis icia coos este 36 


Contents 


3.4 Detection Against Mimicking Attacks ............. cee ce eee eee ee 39 
IAI Similarity MeiG. seisein nera oS ae ceaeavieeseeaedewngees 40 
3.4.2 Flow Correlation Based Discrimination ......................4. 42 
3.4.3 System Analysis on the Discrimination Method .............. 44 
PCIe chit rh actives E i Sees ei els dea A 51 
Attack Source Traceback ................ 0. cece cence ene e renee eeenaeeeenes 55 
As]. VRPOOMCMOM serci orie inr EEEIEE EEEE EREE Saab teas Eh 55 
4.2 Probabilistic Packet Marking Based Traceback .....................54. 56 
4.3 Deterministic Packet Marking Based Traceback .....................4. 59 
4.4 Marking on Demand Traceback Scheme ...................... 02s eee ee 61 
4.4.1 The Framework of Marking on Demand Scheme.............. 62 
4.4.2 System Analysis of the MOD Scheme.......................... 63 
4.5 Network Traffic Based IP Traceback........ 0.0.0... cee cece eee eee eee 68 
4.5.1 System Model for IP Traceback on Entropy Variations........ 69 
4.5.2 System Analysis on the Model................. ccc eee eee e eee eee 71 
Rere EEE eia E E N EEEE E EEE EEE E 74 
DDoS Attack and Defence in Cloud...................... 0 cece cece eee es 77 
Dok MEDOR oeer a a Ea N sae E adden 77 
52. Defeat DDoS Attacks in COU cscs. cs vesicaetasatedes he enipe cass 80 
5.2.1 System: Model an treneral poris ir cee eusa nate ead Mass 81 
5.22 -Approximationof the Model .....0.....0.00cssessennsetecewesews 82 
5.2.3 Resource Investment Analysis -isisisi ssni c eee rse 83 
5.2.4 System Analysis for Non-attack Cases ....................00005 84 
5.2.5 System Analysis for Attack Cases ............... cee eeeeee eee 85 
5.3 A Cloud Firewall Framework Against DDoS Attacks ................. 87 
5.3.1 Dynamic Resource Allocation for Cloud Firewall ............. 90 
5.3.2 Single Chain vs Multiple Parallel Chains....................... 90 
Pe ee 2s si toed het iee shoei eee heli rah Nots che ve E eee 92 
Future WO8 osser cess euh ade sees aes S EES E TEREA ETENEE IE 95 


AEE EE E E E E E E E EE E ene tees 96 


Chapter 1 
An Overview of DDoS Attacks 


Abstract In this chapter, we firstly review the short history of denial of service 
(DoS) and distributed denial of service (DDoS) attacks. We further explore the 
reasons why the current cyberspace is a heaven for cyber criminals, such as DDoS 
attackers. We present the challenges of the DDoS related research fields from 
various aspects, and discuss the possible research methods and strategies to serve 
the challenges. 


1.1 Introduction 


The Internet has become an important part of our society in numerous ways, 
such as in economics, government, business, and daily personal life. Further, an 
increasing amount of critical infrastructures, e.g., power grid, air traffic control, are 
managed and controlled via the Internet, in addition to traditional infrastructure for 
communication. However, today’s cyberspace is full of attacks, such as Distributed 
Denial of Service (DDoS), information phishing, financial fraud, email spamming, 
and so on. 

We can see that cyberspace has become a heaven for intelligent criminals, who 
are motivated by significant financial or political reward. According to an annual 
report from the FBI’s Internet Crime Complaint Centre, financial loss resulting 
from cyber attack totalled US$559.7 million in 2009. Symantec identified more 
than 5.5 billion malicious attacks in 2011, an increase of 81% over the previous 
year. Moreover, the number of unique malware variants increased to 403 million, 
and the number of web attacks per day increased by 36 %. 

Among various Internet based attacks, Denial of Service (DoS) attack is a critical 
and continuous threat in cyber security. In general, DoS attacks are implemented by 
either forcing a victim computer to reset, or consuming its resources, e.g., CPU 
cycles, memory or network bandwidth. As a result, the targeted computer can no 
longer provide its intended services to its legitimate users. When the DoS attacks 
are organized by multiple distributed computers, it is called distributed denial of 
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service attack, which is a popular attack method in the cyberspace. From classical 
textbooks, we know security falls into three categories: confidentiality, availability 
and integrity. It is obvious that DDoS attacks belong to the availability category. 

The idea of denial of service has been in place for a long time in human history, 
such as city besiegment in ancient wars. This concept firstly appeared in the digital 
world in 1984 from the research on operating systems [1]. With the booming of 
the Internet in the middle of the 1990s, DDoS attacks are getting more and more 
familiar to general public. There are numerous survey papers on DDoS attacks from 
various perspectives, such as [2] and [3]. 

It is reported that there were only six DDoS related attacks in 1988, and the 
number of attacks has been increasing in an exponential style. At the same time, 
attack rates continuously reached high levels. In year 2000, well-known web sites, 
such as CNN, Amazon and Yahoo, became the targets of DDoS attacks, and the 
attack rate was around 1 GB per second. A report showed that a DDoS attack rate 
reached 70 GB per second in 2007. As we are writing this book, many Internet users 
have experienced the “biggest DDoS attack” in history in March 2013. The peak of 
the attack reached 300 GB per second. We truly believe this record will be beaten 
again in the near future, believe it or not. 

The purpose of early attacks was mainly for fun and curiosity about technology. 
However, we have recently witnessed an explosive increase in cyber attacks due 
to the huge financial or political rewards for cyber attackers. The news of DDoS 
attacks occupies the headlines of newspapers from time to time. It is not surprise 
that many nations have established their cyber armies. For example, on June 19, 
2012, the Washington Post reported that the US and Israel governments launched 
two sophisticated viruses, Flame and Stuxnet, in order to disrupt Iran’s petroleum 
production and distribution infrastructure and its uranium-enrichment facilities. 

Despite all the efforts from industry participants and academia, DDoS attack is 
still an open problem, and there are many challenges that we have to overcome. 
For example, we are embarrassed to face inquiries from the public, such as who are 
cyber criminals? and where are they? We list some of the essential reasons for this 
passive situation as follows. 


1. The no security design of the ARPARNET network. As we know, the Internet 
came from the private network, ARPARNET. As a private network, there were 
very limited security concerns in the original design. However, the private 
network became a public network in the 1990s, and now many killer applications 
are running on the Internet, such as e-business. Many security patches have been 
developed and installed to circumvent the inherent vulnerabilities, however, the 
effectiveness of these efforts are sometimes limited. For example, the Internet 
was designed in a stateless style, therefore, a receiver has no information about 
which routers a received packet went through. Moreover, it is easy to perform 
source IP spoofing. 

2. The Internet is the largest man-made system in human history. The cyberspace 
is huge and complex, and stays in an anarchy status. It is impossible to force 
a policy to all parties of the Internet, and collaboration among different ISPs 
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is hard to implement security policies. More importantly, there are ISPs who 
support malicious activities for financial or political purposes. 

3. Cyber attackers are enjoying one incredible advantage of the cyberspace: it 
is hard for defenders to technically identify attackers. Moreover, there lacks 
international laws or agreements among nations to bring cyber criminals to 
justice who commit crimes in one country but are living in other countries. 

4. Hacking tools and software are easy to obtained. As a result, an attacker may not 
need profound knowledge of networking or operating system to initiate a cyber 
attack. 


1.2 How to Launch DDoS Attacks 


In general, DDoS attacks can be launched in two forms. The first one targets to crash 
a system by sending one or more carefully crafted packets, which are designed based 
the vulnerability of the victim. For example, the “ping-of-death” attack, which can 
cause some operating systems to crash, freeze, or reboot. This form of DDoS can 
be defeated by patching the system vulnerabilities. The second form DDoS is to 
use a large amount of traffic to exhaust the resources of a victim, such as network 
bandwidth, computing power, operating system data structures, and so on. As a 
result, the quality of service of the victim is significantly degraded or disabled to its 
legitimate clients. Compared with the first form, the second form of DDoS attack is 
hard to deal with. In the rest of this book, we focus this kind of DDoS attack. 

In order to launch an effective DDoS attack, cyber attackers have to firstly 
establish a network of computers, which is known as a botnet or army. We call 
the people who control a botnet as botmasters or botnet owners. 

In order to organize a botnet, attackers take advantage of various methods to 
find vulnerable hosts on the Internet to gain access to them. Attackers generally use 
different kinds of techniques (referred to as scanning techniques) to find vulnerable 
machines [4]. The next step for the attacker is to install programs (known as attack 
tools) on the compromised hosts. The hosts running these attack tools are known as 
bots or zombies [2,5,6]. The headquarter of a botnet is call command and control 
(C&C) server. It is necessary for a C&C server to communicate with its bots for a 
number of reasons, such as updating the attack tools, and issuing an attack order. 

In order to sustain their C&C servers from detection, botnet programmers may 
set up a few intermediate nodes as step stones between the C&C server and 
bots. They also take cryptography techniques to encrypt the messages of their 
communication. Moreover, in order to avoid evictions, botnet programmers are 
taking various techniques, such as IP flux or domain flux, to sustain their C&C 
servers. Consequently, they also need to design novel strategies for their bots to 
phone home. 

There are two different DDoS attack classes: typical DDoS attack and DRDoS 
(Distributed Reflection Denial of Service) attack. The hosts of both categories are 
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Fig. 1.1 A typical distributed 
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compromised machines that have been recruited during the scanning process and 
are installed with malicious code. 

As shown in Fig. 1.1, in a typical DDoS attack, an attacker coordinates and orders 
the C&C server, and in turn, it coordinates and triggers bots. More specifically, the 
attacker sends an attack command to the C&C server who activates all attack pro- 
cesses on the bots, which are in hibernation, waiting for the appropriate command 
to wake up and start attacking. Then, C&C servers, through these processes, send 
attack commands to bots, ordering them to mount a DDoS attack against a victim. 
By doing it this way, the bots begin to send a large volume of packets to the victim, 
flooding its system with useless load and exhausting its resources. 

Unlike a typical DDoS attacks, a DRDoS attack network consists of C&C servers 
and reflectors as shown in Fig. 1.2. The scenario of this type of attack is the same as 
that of a typical DDoS attack up to a specific stage. The attackers have control over 
C&C servers, which, in turn, have control over bots. The difference with a DRDoS 
attack is that bots, led by C&C servers, send a stream of packets with the victim’s IP 
address as the source IP address to other uninfected machines (known as reflectors). 
This exhorts these innocent machines to connect to the victim because they believe 
that the victim was the host that requested it. As a result, there is a large amount of 
traffic to the victim from the reflectors for the opening of a new connections. 

Researchers from academia and industry have proposed a number of methods 
to defend against the DDoS threat. Despite these efforts, DDoS attacks still remain 
a huge threat. Attackers manage to explore new weaknesses in computer systems 
and communication protocols after known weaknesses have been patched up. In 
some cases, attackers also exploit defense mechanisms in order to develop attacks 
to conquer these mechanisms or exploit them to generate false alarms and cause 
catastrophic results. 

In general, DDoS defence can be classified into three categories: detection, 
mitigation and traceback. We will explore these three aspects in the rest of this 
book. 
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As we know the ICT world develops quickly, and we witness new applications 
from time to time. Whenever there is a new computing platform or new computing 
model, cyber criminals will quickly develop their tools and weapons to commit their 
malicious tasks. 

With the booming of cloud computing, cyber attackers have targeted on this new 
computing platform. We have seen reports on DDoS attacks in clouds. A variation of 
a DDoS attack in cloud computing is the Economic Denial of Sustainability (EDoS) 
attack [7] or the Fraudulent Resource Consumption (FRC) attack [8]. There are 
also many researches have been done in this field. For example, Lua and Yow [9] 
proposed to establish a large swarm network to mitigate DDoS attack on cloud, and 
an intelligent fast-flux technique is used to balance the work load. However, we truly 
believe that there are many new questions to be answered in this battle ground. We 
list some of the interesting questions as follows. 


1. Can a super DDoS attack disable the service of a cloud data center? 

2. How should we deal with DDoS attacks on cloud hosted services? 

3. How can we prevent malicious parties to rent cloud resources to mount DDoS 
attacks? 

4. What should a cloud firewall look like? 


1.3 Challenges in DDoS Related Research 


As we have seen in the previous sections, defenders are quite passive and vulnerable 
against DDoS attacks due to the no-security original design, complexity and super 
large scale of the Internet, and the anarchy management fashion. We cannot change 
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these factors as they are already in place. We are more interested in how to 
address problems. In our understanding, we can address the problem in threefold 
as follows. 


1. Understand the cyberspace theoretically and deeply. Due the the fact that we 
have a very limited understanding of the cyberspace, the American National 
Research Council proposed a new research field as network science in 2006, 
and targeting to advance our knowledge of networks and networking [10]. The 
Internet is a major study object of the network science. Moreover, the majority 
of current dominant Internet modeling is based on the random graph model 
proposed in 1959 [11], which is far before the birth date of the Internet and the 
Web; More and more recent observations indicate that there is a great discrepancy 
between the random graph based models and the reality. Started around the end 
of last century, new discoveries and models of the Internet and the Web were 
reported constantly, such as the small world model [12], the scale free model 
[13], and the complex networks [14]. Power law (which is usually represented 
by the Zipf distribution or the Pareto distribution) was found pervasive in nature, 
economics and man-made systems, such as individual income among a group 
of people, word frequency in a language. Scientists have also found many 
power law phenomenon in the cyberspace. For example, the popularity of web 
pages follows the Zipf distribution [15], the size of web documents follows the 
Pareto distribution [16]. Is power law pervasive or dominant in the cyber space? 
Researchers cannot answer this question so far. Moreover, IEEE is launching a 
new journal, IEEE Transactions on Network Science and Engineering, which is 
focusing on network science. 

2. Understand our cyber opponents in a correct way. Due to security and privacy 
reasons, it is hard for us to collect or share cyber attack data from industry 
participants and government agencies. As a result, we can only image our cyber 
opponent with partial or even misleading information. In order to win the battle, 
we have to understand our opponents in time and in a correct way. 

3. Design and implement effective and efficient strategies to beat cyber crimes. With 
the solid output of the previous two aspects, we can make effective strategies to 
beat cyber attacks, including DDoS attacks. However, this step looks a bit far 
from today as we are struggling at the first and second steps. 


We therefore discuss the three aspects in detail in the rest of this chapter for 
interested readers in the unfolding battle against cyber crimes. 


1.3.1 Malicious Networks 


Botnet has become the engine of cyber attacks, and it is a typical and dominant 
malicious network. In this section, we present a summary on what have been done 
by the research community in this field, more detailed information about botnet will 
be discussed in Chap. 2. 
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A botnet is a group of compromised computers on the Internet, and is controlled 
by botmasters through control and command centres (also referred as to CC). 
Examples of botnets include DSNXbot, evilbot, G-Sysbot, sdbot, and Spybot [5,17]. 
Botnets are the major attack networks behind various attacks in current cyberspace. 
Botnets are pervasive, existing simultaneously in many commercial, production and 
control networks. The size of a botnet could be as large as millions [18]. Because of a 
large number of machines, botnets can be lethal in bringing down targeted networks, 
either power grids or air traffic control networks, or communication networks. 

On top of the complex of structure and dynamics of the cyberspace, botnet 
owners are exhausting their energy to disguise botnet activities and traces against 
detection and elimination. Attackers have at their disposal state-of-the-art tech- 
niques, such as stepping stone, reflector, IP spoofing [2, 3], code obfuscation, 
memory encryption [19], and peer-to-peer implementation technology [3, 17] to 
cover and sustain their bots. One critical issue for botnet writers is making sure 
that all bots can contact their CC centre while the physical server and IP of CC 
centres are kept changing frequently in order to avoid detection or elimination. 
One common practice by botmasters is “IP fast-fluxing”, where the botnet owner 
constantly keeps changing the IP-addresses mapped to a CC server [20]. The 
shortcoming for this method is that the botnet is easy to be destroyed once the 
domain name is known to defenders. In order to overcome this disadvantage, botnet 
writers, such as Conficker, Kraken and Torpig, have recently developed a new 
method: “domain fluxing” [5], where each bot algorithmically generates a large set 
of domain names and queries each of them until one of them is resolved, and then the 
bot contacts the corresponding IP-address obtained that is typically used to host the 
command-and-control server [21]. In case of a combination of the two happens, this 
would be even more difficult to detect. The current method against domain fluxing 
is to catch bots using honeypots, and use reverse engineering to obtain the URL 
generation algorithm, which is time consuming and of low accuracy. It is hard to 
detect bots as the infected computers continuously functioning as normal machines. 
Besides the techniques that hackers are using, the duration of botnet activities is 
usually short and random to defenders, which makes it tougher for defenders to 
collect botnet related data. 

Botnet has been investigated from various angles for around 10 years. McGrath 
and Gupta analysed botnet characteristics, such as IP address distribution, who is 
records and lexical features of phishing and non-phishing URLs [22]. Researchers 
employ statistical learning techniques based on lexical features (length of domain 
names, host names, number of dots in the URL etc.) and other features of URLs to 
automatically determine if a URL is malicious, i.e., used for phishing or advertising 
spam [23]. Perdisci et al. implemented a detection mechanism based on passive 
DNS traffic analysis against IP fast fluxing [20]. Xie et al. focused on detecting 
spamming botnets by developing regular expression based signatures from a data set 
of spam URLs [24]. Recently, special techniques have been developed for detecting 
botnets. It is expected that infiltrated or subverted machines (acting as bots) will 
contact the botmaster at regular time intervals and these contact times can yield 
an opportunity for their detection [5, 18]. Bothunter [25], botminer [26] are two 
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such tools employing these techniques. Network telescopes have been employed to 
observe malicious traffic at various vantage points of networks [27]. 

The above covers the majority of the work from cyber security community. As we 
just discussed, we can see that our understanding of botnets or malicious networks 
is shallow, and many questions remain to be answered. 


1.3.2 Data Collection of Malicious Networks 


For privacy and security reasons, it is hard to collect attack data from ISPs and 
related companies. The available data set are usually collected by honeypots [28], 
glob experimental networks, such as the planet lab [29], or large scale monitoring 
systems [30,31]. The problem is that usually the data are not the ones that we expect 
exactly. For instance, the observation range is not what we desire, some information 
are missing, e.g. routes. Due to this fact, we desperately need mathematical tools 
to infer a complete picture of malicious network using the partial information that 
we have, such as the recently invented tool of compressed sensing [32]. There are 
plenty work to do to fit the related mathematical tools to address this problem that 
we face. 

Even with expected data is in place, we still face a challenge of data processing. 
It is sure that the data we collected are the mixture of numerous malicious networks. 
For example, a DNS request failure data set that we obtain is usually the results of 
multiple botnets. A study of a mixed data set will mislead us with a high probability. 
In order to study the features of an individual malicious network, we have to separate 
the mixed data into clusters. We have conducted a preliminary study at a campus 
network in 2011, which indicated that the bots in the campus network belong to 
different botnets, and usually one computer may host bots of multiple botnets. 

The challenge here is that we do not know how many botnets are there in a 
collected data set. Previous study on cyber security indicates that bot behaviour 
among one botnet usually possesses high similarity compared with the bot behavior 
of other botnets [33, 34]. The similarity could be identified in temporal, spatial or 
other features. The similarity among a botnet may different from another botnet, and 
it is not a easy job to identify the similarity of a given botnet. 

The unsupervised machine learning is an existing and promising tool for 
the clustering challenge. The approaches of unsupervised learning include two 
categories: clustering and blind signal separation. Researchers have proposed many 
algorithms for this research field, such as principal component analysis, singular 
value decomposition, mixture models, k-means, and hierarchical clustering [35]. 

Besides these traditional methods, we also noticed a recently developed tech- 
nique, graph spectrum [36], which is also promising to address the challenge. 
Researchers have found that bots from one botnet have more connections, e.g. the 
Sybil attacks in cyberspace [37, 38], however, the connections amongst different 
malicious networks are actually very limited or none according to a latest observa- 
tion [39]. A graph of connections among nodes can be established among the nodes 


1.3 Challenges in DDoS Related Research 9 


in the mixed data set. Based on the graph we can obtain an adjacency matrix, which 
can be further transformed into the spectra space, where the nodes belong to the 
same botnet will establish a straight line in theory. As a result, we can accurately 
separate botnets in the spectra space. 

We note that Big Data is an extremely hot topic at the moment. Obviously, the 
techniques from Big Data research are highly expected to address the problems in 
the problems that we mentioned in this subsection. 


1.3.3 Topology Modelling of Malicious Networks 


It is the toughest challenge to conduct topology modeling on the Internet and 
malicious networks. The topology of a network is a piece of critical information 
as physicists believe that structure determines functions. Therefore, it is especially 
important for us to understand the topology of botnets or other malicious networks. 
If we know the topology of a given botnet, then we can figure out the key nodes of 
the network. As a result, we can work with limited ISPs or organizations to fight 
against the botnet, e.g. terminate possible attacks or block communication path of 
bots. 

However, so far our understanding in this aspect is extremely limited. The 
possible reason is that the data we have is usually “flat”. For example, when we 
catched a malicious packet, we can only know its source IP address and destination 
address, however, the path from the source to the destination is usually hard to 
obtain. 

Graph theory is a traditional and good tool for network topology modeling. 
Another popular method for network topology is network tomography [40, 41]. 
However, both of these methods study static graphs or networks. It is easy to 
notice that these tools are not sufficient to model the ever changing Internet or 
malicious networks. Moreover, we have to point out that majority of the current 
network models are loyal to their underneath physical network nodes and links. In 
our understanding, the following two directions are promising to explore on top of 
the traditional theories and tools. 


e Logical topology. The current network topology models are loyalty to its physical 
networks, which may not reflect the truth of overlay networks, such as botnets. 
A logical model probably can represent a botnet more practically on top of the 
physical nodes and links. 

e Dynamic graph. The traditional graph theory focuses on static graphs, however, 
the Internet or botnets are usually changing constantly. Therefore, it is necessary 
to inject dynamic elements into the classical graph theory to reflect the real 
situation of malicious networks or the Internet. 
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1.3.4 Dynamics of Malicious Networks 


Botnet dynamics includes many aspects, the most important one is the number of 
bots of a given botnet against time, simply, the size of botnet. This information is 
valuable to defenders, as defenders can organize their defence and budget the cost 
better with this information in place. 

There are some work on size of botnet. A direct method to count the number of 
bots is performance botnet infiltration to count bot IDs or IP addresses. Stone-Gross 
et al. [5] registered the URL of the Torpig botnet before the botmaster, therefore 
hijacked the C&C server for 10 days, and collected about 70 G data from the bots of 
Torpig botnet. They reported that the footprint of the Torpig botnet was 182,800, and 
the median and average size of the Torpig’s live population was 49,272 and 48,532, 
respectively. They found 49,294 new infections during the 10 days takeover. Their 
research also indicated that the live population fluctuated periodically because of 
users switch between online and offline. Another method is DNS redirection. Dagon 
et al. [42] analyzed captured bots by honypot, identified the C&C server using 
the source code reverse engineering tools, manipulated the DNS entry, which was 
related to a botnet’s IRC server, and redirected the DNS requests to a local sinkhole. 
They therefore were able to count the number of bots in a botnet. Their method 
counts the footprint of the botnet, and reported that the size of botnet (footprint) 
can reach 350,000. Rajab et al. [18] pointed out that it is inaccuracy of counting 
the unique IP addresses of bots, because of the DHCP and NAT techniques were 
employed extensively on the Internet ([5] confirms this by their observation that 
78.9 % of the infected machines were behind a NAT, VPN, proxy, or firewall). They 
therefore proposed to examine the hits of DNS caches to find the lower bound of the 
size of a given botnet. 

The main challenge of this field is: For a given botnet or malware and a given 
range of the network, what is the density of bot or malware in the network? There 
is plenty of research concerning the recruitment of malware networks based on 
epidemic theory [43, 44], however the research on malware or bot distribution 
is limited. To date, we only know that the distribution is non-uniform based on 
information theory [45], and the network topology has a big impact on the spread of 
malware [46]. 

The dominant tool for the size issue is the epidemic model, which is the major 
theory for biology virus propagation modeling, and is also used by computer 
scientists [43]. As the member recruitment of botnets is essentially the same 
as computer viruses, therefore, the usage of the epidemic theory looks effective 
to model the size of botnet. However, researchers have noticed that the current 
computer virus model lacks accuracy after the early stage of propagation [44]. 

As the botnet dynamics is mainly related with time, therefore, time series analysis 
methods are probably effective to address this problem. Many questions remain 
unanswered, e.g. periodicity, frequency of various bot recruitment and attacking 
activities; What is the distribution of a specific botnet or virus? How many of the 
Internet nodes have been compromised since the beginning of a botnet? 
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1.3.5 Concealed Malicious Activity Detection 


There are many intrusion detection and virus detection algorithms in place [30,31], 
but only limited detection algorithms on malicious activity throughout the current 
literature. We do not know how many illegal activities go undetected using current 
detection systems. The false negative rate is an essential challenge for us, since 
attackers are exhausting their efforts to disguise their malicious traces. In some 
cases, malicious bots demonstrate decent behaviour most of the time in order to 
fool our detection systems. 

In order to address these issues, it is necessary to integrate understandings of 
human criminal behaviour with information techniques to reduce the false negative 
rate of detection as much as possible. For a long time, the network security 
community has focused on technology oriented methodologies, and ignored the 
human aspect of criminal behaviour, which greatly enhances our understanding of 
criminals. There are some work in this direction [33,34]. The game theory [47] 
and social network technologies [37,48] should be employed in the design of the 
detection algorithm of concealed malicious activities. 

In particular, the following two aspects should be investigated. 


1. Identifying the boundary of detection for a given level of security investment 
using game theory. It is obvious from the attackers’ viewpoint that a high 
frequency of malicious activity results in a high probability of being detected. 
For example, frequent vulnerability scanning or sensitive data downloading 
will make the compromised computer stand out from its peers. There is a 
threshold at which malicious activity is far more prone to detection. Presently, 
the network security community has no conception of where this boundary lies. 
It is worthwhile to explore this bound between detectable and undetectable using 
game theory and identify the Nash Equilibrium (if it exists). With the bound 
information in hand, we can actually estimate the false negative probability in 
detection. With this information in place, researchers can develop a strict low 
false negative detection algorithm, which can push the threshold to a minimum, 
consequently suppressing the frequency of malicious activities. 

2. Identifying malicious nodes using social network technologies. In general, we 
can divide all Internet based nodes into two groups, benign and malicious (e.g. 
members of one specific botnet). It has been proved that the communication 
among the nodes within each group is quite rich. However, there is much less 
communication among nodes from different groups. Therefore, for a given node, 
the probability that the node is malicious increases if the node has a certain 
amount of communication with the known malicious nodes. 


1.3.6 Forensics of Malicious Networks 


Cyber forensics is an attractive topic, and is extremely important as we have 
more and more killer applications in the cyberspace. However, the work on this 
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field is not much. One solid topic is IP traceback, which refers to the capability 
of identifying the actual source of malicious packets sent across the Internet. 
Current methods of traceback rely on independent local networks with no global 
coordination. They are hence incapable of accurately tracing back cyber criminals 
at the Internet level. We can categorize the methods of IP traceback into three major 
groups: deterministic packet marking (DPM in short) [49-51], probabilistic packet 
marking (PPM in short) [52,53], and information theoretical based method [54]. The 
first strategy marks IP packets at the source local area network where the packets 
are generated, whereas the second strategy marks incoming packets at the edge 
routers of the local area network where the potential victim resides. Both of these 
strategies require routers to inject marks into individual packets. Moreover, the PPM 
strategy can only operate in a local range of the Internet (e.g. ISP networks), where 
the defender has the authority to manage. However, this kind of ISP networks is 
generally quite small, and we cannot traceback to the attack sources located out of 
the ISP network. The DPM strategy requires all the Internet routers to be updated for 
packet marking. However, with only 25 spare bits available in an IP version 4 packet, 
the scalability of DPM is a huge problem. Moreover, the DPM mechanism poses 
an extraordinary challenge on storage for packet logging for routers. Therefore, it 
is infeasible in practice at present. Further, both PPM and DPM are vulnerable to 
hacking, which is referred to as packet pollution. The third method measure the 
variation of flow entropy at routers to traceback to attack sources. It overcomes the 
disadvantages of the previous two, however it needs a global collaboration, which 
is hard to achieve. 

Attack source inferring is an applicable method for today’s cyber environment as 
direct traceback is almost impossible. In this case the Bayesian inference networks 
is probably a good choice. The research community desires to have effective and 
efficient tools to carry out cyber forensics tasks. 
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Chapter 2 
Malicious Networks for DDoS Attacks 


Abstract In this chapter, we explore botnet, the engine of DDoS attacks, in 
cyberspace. We focus on two recent techniques that hackers are using to sustain their 
malicious networks, fast fluxing and domain fluxing. We present the mechanisms of 
these two techniques and also survey the detection and anti-attack methods that have 
been proposed against them in literature. 


2.1 Introduction 


Nowadays, there are numerous malicious attacks in the cyberspace. These attacks 
are pervasive in the Internet, and often cause great financial loss [1, 2]. Botnets 
are the engines behind majority of the attacks. A botnet is usually established by a 
botnet writer developing a program, called a bot or agent, and installing the program 
on compromised computers on the Internet using various techniques. All the bots 
from a botnet are controlled by a botmaster. The hosts running these programs are 
known as zombies [1, 3, 4]. For a botnet, there is one or a number of command 
and control (C&C) servers to communicate with bots and collect data from them. 
In order to disguise himself from legal forces, botmaster changes the URL of his 
C&C frequently, such as weekly. An excellent explanation about this could be found 
in [3]. 

Motivated by huge financial or political reward, attackers find it worthwhile to 
organize sophisticated botnets for use as attack tools. There are numerous types of 
botnets in cyberspace, such as DSNXbot, evilbot, G-Sysbot, sdbot, and Spybot [3]. 
On one hand, researchers have studied botnets from various perspectives, including 
botnet probing events [5], Internet connectivity [6], size [7], and domain fluxing 
[8, 9]. On the other hand, botnet owners have at their disposal state-of-the-art 
techniques, such as stepping stones, reflector, IP spoofing [1, 10], code obfuscation, 
memory encryption [11], and peer-to-peer implementation technology [10, 12, 13] 
to sustain their botnets and disguise their malicious activities and traces. 
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A report from Symantec’s MessageLabs shows 90.4 % of total emails were spam 
in June 2009. Among all spam, 83.2 % was sent through botnets. In addition, many 
spam emails included viruses, phishing attacks, and web-based malware. Therefore, 
sending spam through botnets can help to conduct further network attacks [14]. 

Researchers have applied signature-based methods to detect botnets for a long 
time. These signature-based techniques have been widely employed by some 
Honeynet projects, which has been discussed in [15, 16]. However, these methods 
cannot detect newly developed botnets as the signatures of new botnets are unknown 
or some botnets are polymorphic [17]. Some IRC-based approaches were developed 
to overcome this problem. For example, Binkley et al. [18] developed an anomaly 
based system combining IRC statistics and TCP work load, and Karasaridis 
et al. [19] applied a passive anomaly-based characterization methodology based on 
botnets behavior characteristics. However, these methods have high false positive 
rates [17]. 

Many researches focused on how to detect botnets or trace the botnet master. 
Meanwhile, many surveys reflected what had been done and summarized what 
future work should be. Feily et al. [20] surveyed botnet mechanisms and botnet 
detection techniques based on different classes they identified: signature-based, 
anomaly-based, DNS-based, and mining-based. They also compared and evaluated 
the advantages and disadvantages of some typical researches from each category. 

However, in order to disguise their traces and malicious activities, botnet writers 
are exhausting their energy to design new strategies and mechanisms to fly under 
the radar. In this chapter, we discuss two recent advanced botnet mechanisms: 


1. Fast Flux (FF in short): A mechanism that a set of IP addresses change frequently 
corresponding to a unique domain name. 

2. Domain Flux (DF in short): A mechanism that a set of domain names are 
generated automatically and periodically corresponding to an URL of a C&C 
server. 


2.2 The Fast Flux Mechanism and Detection 


2.2.1 The Fast Flux Mechanism 


Fast Flux (FF) is an earlier strategy employed by hackers to evade botnet detection. 
By IP fast flux, the mapping between multiple IP addresses and one single domain 
name is rapidly changing [21]. This technique makes it sophisticated to block or 
take down the C&C Server. 

Networks that apply fast flux techniques are called fast fluxing network (FEN). 
Both legitimate or suspicious FFNs show almost the same characteristics, such as 
short TTLs and large IP pools [22]. Furthermore, fast flux can be classified into two 
categories: single flux and double flux. 
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In terms of single flux, a domain name may be resolved to different IPs in 
different time ranges. For example, a user accesses the same domain name twice 
in a short time period. For the first time, a bot sends a DNS query to the DNS 
server, which resolves that the corresponding IP address as /P,. With JP; in place, 
the bot accesses a flux agent FA, which redirects the request to the real server 
“mothership”. This “mothership” then processes the request and responds to FA,. 
Finally, FA; relays the response back to the bot. After a short while, the same bot or 
other bots may access the same domain name again. However, the mapping between 
the domain name and ZP, has been changed by hackers. As a result, a DNS server 
responses a different IP address, /P2, to the name service request, and the bot uses 
this new address ZP» to connect to another flux agent FA2, which redirects the bot to 
the “mothership” [22]. 

The double flux is a more sophisticated method of counter detection compared 
with the single flux. It frequently changes both the flux agents and the registration in 
DNS servers. That is to say, in addition to fluxing their own agents, the authoritative 
domain name server is also a part of fluxing. This provides an additional layer of 
redundancy within malware networks. The fluxing nodes repeatedly register and 
de-register from the domain name system [21]. 

The fast fluxing network techniques have been abused by attackers to maintain 
their botnets. This is known as fast fluxing network attack (FFNA). In this case, 
almost all compromised computers become fluxing agents. Agents can be added or 
removed from the agent pool dynamically; thus, any mechanism that tends to block 
agents cannot take down the whole botnet [23]. 


2.2.2 Fast Flux Detection 


Holz et al. [24] claimed that they were the first to develop a metric to detect fast 
flux service network (FFSN) empirically. They identified three possible parameters 
that could be used to distinguish normal network behaviors from that of FFSNs: 
the number of IP-domain mappings in all DNS lookups, the number of nameserver 
records in one single domain lookup, and the number of autonomous system in all 
IP-domain pairs. Based on these three parameters, they defined a metric, flux-score, 
which was a result of a linear decision function to detect FFSNs. A higher score 
indicated a higher fluxing degree, and vice versa. They evaluated their metric by a 
tenfold validation using a 2-month observation data set. Results showed that their 
method was able to distinguish normal network behavior from FFSN with a very 
low false positive probability. 

There exist some limitations of these detection methods that focus on detecting 
domains that are related to IP addresses with short TTL in DNS query results 
[24,25]. In 2009, Zhou et al. [23] overcame these limitations by applying a behavior 
analysis model. To achieve this, they began with characterizing the behaviors of FF 
domains at some locations around these FF domains. Based on the analysis of those 
behaviors, they presented an analytical model, which showed the number of DNS 


18 2 Malicious Networks for DDoS Attacks 


queries required to confirm an FF domain. In addition, they speeded up the detection 
by two schemes. The first scheme was to associate IP addresses with the queries’ 
results from multiple DNS servers; the second scheme was to correlate queries’ 
results with multiple possible FF domains. They also proved that the detection 
speed had been speeded up because of those correlation schemes. To avoid single 
point of failure and improve the scalability, they developed a collaborative intrusion 
detection architecture, LarSID, to support the distributed correlation using a peer- 
to-peer publish-subscribe mechanism for evidence sharing. Their results showed 
that their decentralized model was 16—10,000 times faster than previous centralized 
model [25] with the same correlation schemes [23]. 

Caglayan et al. [26] developed a real-time detection model for fast flux service 
networks (FFSN). They proposed to monitor the DNS activity of a web site at the 
minute level using both active and passive methods in a distributed fashion. The 
model included three key components: sensors, FF monitor database, and fast flux 
monitor (FFM). For the first key component, there were three kinds of sensors: 
active sensors, passive sensors, and analytic sensors. Active sensors were designed 
to monitor several indicating parameters including TTL, FF activity index, and 
footprint index. Passive sensors, however, were just functional replication of active 
FFM sensors by leveraging DNS replication services. Analytic sensors were mainly 
responsible for checking whether the IP addresses used by a certain web site existing 
in a blacklist. 

The FF monitor database was designed to record information, such as known 
FFSNs, zombie IPs, collected from the sensors. By analyzing the data in the 
FFM database, some analytical knowledge was able to be harvested. For example, 
the size of FFSNs, growth rate estimation, social network of a FFSN where IP 
addresses were shared by diverse FFSNs, footprints of a FFSN for a given ISP in 
a given country, and so on. Finally, they developed a FFM classifier, which applied 
a Bayesian belief network to integrate multiple active and passive sensors. This 
classifier was then trained to calculate a prediction confidence. They demonstrated 
empirically how their model can generate report to assist security analysts to 
evaluate the security of a web site with acceptable accuracies. To improve the 
detection accuracy, the model calculated decisions every 10 min. The more sampling 
performed, the higher accuracy it obtains [26]. 

Perdisci et al. [27] developed a recursive DNS (RDNS) tracing methodology to 
detect malicious flux service networks in the wild. In their model, a sensor was 
deployed in front of the RDNS server, and passively monitored DNS traffic and 
stored information from a FF domains into a centralized data collector. Furthermore, 
to aim on botnets, they developed pre-filtering rules that were used to identify 
malicious FF networks. They considered a network as malicious FF network by 
four characteristics (short TTL, the change rate of the set of resolved IPs returned 
from each query, a large number of resolved IPs, and resolved IPs scattered across 
different networks). Based on these rules, they developed filters to gather the traffic 
they required. Besides, the filters in the sensors stored historic information. Different 
from most previous works, they conducted a fine-grained analysis on collected data. 
Firstly, they clustered domains with high relations based on their common features. 
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Secondly, the clusters of the domains were then classified according to the resolved 
IP addresses. Finally, they applied a statistical supervised learning method to build 
a network classifier to distinguish malicious flux services from legitimate ones. 
Results indicated that their model was able to distinguish and classify malicious 
and benign FF networks clearly. 

Yu et al. [22] pointed out that one critical step of detecting botnet fast flux was to 
distinguish the fast fluxing attack network (FFAN) from benign fast fluxing service 
network (FFSN). Their idea was an improvement of the method discussed in [24], 
which was not able to distinguish benign ones from malicious ones. They identified 
FFAN by observing the agent lifespan. They showed that all agents in FFSN should 
keep alive almost 24/7. However, the alive time of FFAN bots is unpredictable to 
some extent, because the compromised computers cannot be controlled by FFAN 
master physically. Based on this lifespan difference between FFSN and FFAN, the 
authors proposed two metrics and developed a monitoring system. The first metric 
they defined was the average online rate (AOR), which was measured once per 
hour within a 24h time interval. The AOR of FFSN should be close to 100%. 
However, FFAN fluxing agents (bots) were out of attackers’ control, thus the AOR 
was usually far bellow the AOR of FFSN. The second metric was the minimum 
available rate (MAR), which was the result of the number of times available out of 
the total measured times. For the same reason, the MAR of FFAN was far lower than 
that of FFSN. Based on the two metrics, they developed a flux agent monitoring 
system consisting mainly of four components. A digging tool was developed to 
gather information and add new IPs into IP record database. The second part was 
the agent monitor that sent HTTP request to the IPs in the IP record database and 
stored the responses. The third one was the IP lifespan record database storing the 
service status: 1 for available service, and 0 for unavailable service. The last key 
component was a detector differentiating FFAN from FFSN by IP lifepan records 
and the two metrics. Experimental results demonstrated that their system was able to 
distinguish FFAN from FFSN clearly because all benign FFSN had high and stable 
AOR and MAR, but that of the FFAN were much lower and less stable. 


2.3 The Domain Flux Mechanism and Detection 


Due to a single domain name, fast flux has a disadvantage of single point failure once 
fluxing is identified. Therefore, hackers developed a more survivable mechanism: 
domain flux. 


2.3.1 The Domain Flux Mechanism 


Stone-Gross et al. [3] pointed out that some recent botnet programs, such as 
Torpig, were using domain flux to sustain their botnets. Inspired by [28-30], they 
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discussed domain-generation techniques and provided a research report on how they 
cooperated with FBI to take over an advanced domain flux based botnet [3]. 

Domain flux is based on the idea of generating domain names through a domain 
generation algorithm (DGA). Both C&C server and its bots follow the same 
algorithm seeded by the same value to obtain consistent dynamic domain names. 
Bots try to contact the C&C server and other servers controlled by the botnet master 
according to a domain list until one DNS query succeeds. If the current domain has 
been blocked or suspended by authorities, bots will try to calculate other domain 
names using the DGA. The key idea is that the algorithm must make sure that all bots 
can generate domains by the same seed. Stone-gross and the co-authors revealed that 
Torpig calculated sub-domains using the current week and year first but independent 
of the current day, and then appended the top level domain (TLD). The domains 
generated might be “weekyear.com” or “weekyear.biz’”’, and so on. At the same time, 
the bots will use these auto-generated domain names to contact the C&C server. If 
failed, bots will use the day information to calculate the “daily domain”, such as 
“day.com” or “day.net”, etc. If all these domains cannot be resolved, bots will try to 
use the hard-coded domain names in a configuration file as the last option [3]. 


2.3.2 Domain Flux Detection 


It is critical for bots to phone “home” using the domain flux technique for botnet 
writers; it is also important for defenders to use this key component to defeat botnets. 
There have been sufficient work in domain flux detection and botnet takeover. 

In 2009, Stone-Gross et al. conducted an in-depth research on taking over the 
real world Torpig botnet [3]. Taking advantage of reverse engineering on domain 
generation algorithm (DGA) of Torpig, they revealed that Torpig owners will not 
pre-register all possible domains in advance. Therefore, they registered the related 
domain name of C & C server of Torpig before the botnet owners. As a result, 
they took over the botnet for about 10 days. The bots of Torpig treated their server 
as the C & C server. The authors recorded many information about the botnet. 
They estimated the size of the Torpig by counting node identifiers (N;q) that were 
unique in Torpig. They also analyzed the advantages of the method by comparing 
to IP count that could be misled by DHCP. Their method was quite different from 
previous works in [7] and [31]. Rajab et al. [7] focused on detecting the size of IRC- 
based botnets by querying DNS server caches to estimate the number of bots who 
had connected the C&C server. Kanich et al. [31] worked on detecting the size of 
P2P storm networks using active probing and crawling the over-net distributed hash 
table (DHT). 

Maetal. [32] applied a supervised machine learning method to detect and prevent 
users from visiting malicious web sites based on automated URL classification. 
It was a lightweight model that investigated the lexical features and host-based 
properties of malicious URLs. The lexical features that they selected were the 
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length of entire URL, and the number of dots in the URL with of a bag-of- 
words representation. For host-based features, they extracted IP address properties, 
WHOIS (registrars) properties, domain name properties (e.g., TTL), and geographic 
properties (physical location, link speed, and so on). To train and evaluate the 
features, they applied three classification models — Naive Bayes, Support Vector 
Machine (SVM), and Logistic Regression on data sets from four different sources 
(two malicious and two legitimate). They found that WHOIS and lexical features 
were able to provide rich information. Moreover, the combination of all features 
to form a full feature set could reach the highest detection accuracy. To compare 
full feature performance with traditional blacklist method, they used a ROC graph 
to explain how full feature made the difference. At last, they showed how their 
classifiers selected automatically from large amount of features and determined the 
most predictive ones and achieved a modest false positive rate. 

Later in 2009, Ma et al. [33] developed an online learning approach based on 
the same two groups of features from their previous work [32]. Compared with the 
previous study, the new model could identify suspicious URLs over time by a live 
feed of labeled URLs from a large web mail provider. Live feed made this model 
more appropriate for online learning and processing on large number of URLs. They 
also pointed out and demonstrated that continuous feed of new features was the key 
to detect new malicious URLs. For their feature selection, lexical and host-based 
types accounted for 62 and 38 %, respectively. They progressed the online learning 
with two steps. Firstly, they designed a sequence of feature-vector label pairs. The 
algorithm made a label prediction by a linear classifier. After obtaining actual labels 
from prediction, the algorithm checked it with the labels in the feature-vector label 
pairs. If they did not match, the algorithm would record it as an error. 

Ma et al.’s online method was a combination of classical and recent algorithms 
including four sub-algorithms. The first sub-algorithm was perception by which 
linear classifier made update to a weighted vector when there were any mistakes. 
The second one is a method of applying stochastic gradient descent to logistic 
regression. This method provided an online means to optimize an objective function 
and approximated the gradient of the original objective. The objective function was 
defined as a sum of the samples’ individual objective functions, and the model 
parameters were updated incrementally by the gradients of individual objectives. 
The third sub-algorithm was the passive-aggressive algorithm, which was used to 
make a minimum change to correct any mistakes and low-confidence predictions. 
The last sub-algorithm was confidence-weighted (CW) algorithm by which less 
confident weights are updated more aggressively than high confident ones. It also 
modeled uncertainties in weights using a Gaussian distribution to describe the per- 
feature confidence. CW could perform a fine-grained distinction among all features’ 
weight confidence. Therefore it was appropriate for detecting malicious URLs as 
long as dynamic features can be constantly introduced. Finally, they obtained a high 
detection accuracy up to 99 % over a balanced data set [33]. 

In 2010, Jiang et al. [34] proposed a light-weight anomaly detection approach 
using DNS failure graphs based on failed DNS queries. They captured all interac- 
tions between hosts and unresolvable domain names, which could be considered 
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as auto-generated domains used by botnets. Edges are represented as associations 
between hosts and domains. They learnt from previous studies that failed DNS 
queries come from a small portion of human errors and mis-configurations [35], 
and a large part of it were generated from malicious activities [36, 37]. In a botnet, 
as all bots use the same algorithm to generate sub-domains, the queries for these 
domains will attribute to correlated failures, which construct density subgraphs 
in a DNS failure graph. They believed that such subgraphs would show some 
interaction patterns. To confirm their assumption, they gathered DNS query data 
from several major DNS servers in a large campus network for 3 months. After that, 
they recursively decomposed the DNS failure graph, and extracted dense subgraphs 
by applying a statistical graph decomposition technique, which is an extension of 
the tri-nonnegative matrix factorization (TNMF) algorithm [38]. By analyzing the 
structure properties, they classified the subgraphs into three categories: host-star, 
DNS-star, and bi-mesh. By referring to some external data sources, such as domain 
name blacklists, they found that bi-mesh structures, where a group of hosts are 
strongly associated with a group of domains, reflect botnet activities [34]. 

Later, Prakash et al. [39] developed a Phishnet to protect system from phishing 
attacks. As shown in the Symantec’s MessageLabs report [14], 83.2 % spam was 
sent through botnets in June 2009. Thus, Phishnet should be able to detect auto- 
generated domain names of those spam sent through botnets. This is a blacklisting- 
based method, which was improved to overcome the limitations of traditional 
blacklisting methods. They pointed out that it was easy to evade URL blacklisting 
because blacklisting methods perform exact matching between target URLs and 
entries in the list. Two key components in their model can help to overcome this 
limitation. In the first component, there are five heuristics to enumerate simple 
combinations of known phishing URLs to discover new phishing sites. It works 
in an offline fashion, examines current blacklists, and generates new URLs based 
on these heuristics systematically. It was also responsible for confirming whether 
the new generated URLs are indeed malicious through DNS queries and content 
matching in an automatic fashion. The five heuristics came from URL lexical 
similarities and their own observations in the PhishTank database. They studied 
these five heuristics to generate new URLs from existing Phishing URLs in a 
blacklist. The first heuristic was to replace the top-level domains (TLD). The second 
one was the IP address equivalence, which meant that they clustered phishing URLs 
by close IP addresses. Then, they created new URLs by combining different host 
names and pathnames in the same clusters. The third heuristic was the directory 
structure similarity. They believed that similar pathes might attribute to similar sets 
of file names. Therefore, they grouped directories with similar structures, and then 
exchanged the filenames among URLs within the same group. The fourth one was 
query string substitution. For similar URLs, they exchanged the queried content 
string which was after the question mark in an URL. The last heuristic was the brand 
name equivalence. Some URLs use the same URL structure but different brand 
names. To confirm their heuristics, the existence of those generated child URLs 
were tested by a verification process. By DNS lookups, the URLs that were not able 
to be resolved would be filtered out. For resolved URLs, the model would establish 
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connections to the corresponding servers first and then took advantage of HTTP 
GET method to retrieve content from it. If the status code was 200/202 (successful), 
a content similarity would be computed between the parent and child URLs. The 
second component was an approximate matching algorithm, which performed an 
approximate match of a new URL against an existing blacklist. The algorithm 
performed the match by measuring the syntactic and semantic variations. To achieve 
this, the algorithm broke the input URLs into four different entities — IP address, host 
name, directory structure and brand name. Each individual entity would be scored 
and then a final score would be computed by combining all individual scores. If the 
final score exceeded a certain threshold, the URL would be marked as suspicious. 
They showed that the system could keep the false negative rate under 3 %, and false 
positive rate under 5 %. 

Yadav et al. [40] researched how to investigate alphanumeric unigrams and 
bigrams (two consecutive characters) to identify domains generated algorithmically. 
They were motivated by the observation that auto-generated domains are quite 
different from legitimate ones in terms of spelling or pronounceable features. They 
developed a few detection metrics based on signal detection theory and statistical 
learning technique. Their method focused on detecting domains generated from 
pseudo-random string generation algorithms and dictionary base generators, which 
produced pronounceable words that were not in English dictionaries. Following that, 
they implemented their model in two parts. Firstly, they grouped DNS queries by 
Top Level Domain (TLD), IP-addresses they were mapped to, or the connected 
component they belonged to. Secondly, they used the metrics to characterize the 
distribution of the alphanumeric characters or bigrams. They employed three metrics 
(the KL-distance, the Jaccard index, and the edit distance) to distinguish a set 
of legitimate domain names from malicious ones. They also performed in-depth 
per-domain analysis, per-IP analysis, and component analysis to evaluate the perfor- 
mance. Through the experiments, they found that Jaccard index metric performed 
the best, followed by the edit distance measurement and the KL divergence. They 
highlighted that their methodology was able to be used to detect unknown and 
unclassified botnets. 


2.4 Modelling Malicious Networks 


The model for network virus infection and curing has been explored extensively. 
Based on epidemiology research, Zou et al. [41] proposed a number of models for 
monitoring and early detection of Internet worms. As they pointed that this kind of 
models are appropriate for a system that consists of a large number of vulnerable 
hosts; in other words, the model is effective at the early stage of the outbreaks of 
virus, and the accuracy of the model drops otherwise. There are a few assumptions 
in this model as follows. 
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1. There are only two possible states for a given node in the network, healthy or 
infected. 
2. The nodes stay in the system forever, and there is no curing process. 


As a variant of the epidemic category, Sellke et al. [42] proposed a stochastic 
branching process model for characterizing the propagation of Internet worms, the 
model especially focuses on the number of compromised computers against the 
number of worm scans, and presented a close form expression for this relationship. 

Dagon et al. [43] extended the model of [41] by introducing with time zone 
information a(t), and built a model describe the impact on number of live 
members of botnets with diurnal effect. The impact of side information on the 
spreading behavior of network virus is also explored, such as the topology network 
information [44—46], distribution of vulnerable hosts [47], and they are more focus 
on the life circle of infections in theory. 

The features of peer-to-peer networks have been explored extensively [48—50]. 
Stutzbach and Rajaie studied the three different p2p systems that are widely 
deployed, unstructured file-sharing system(Gnutella), content-distribution system 
(BitTorrent) and distributed hash table (Kad), and they found that the group-level 
properties of the dynamics exhibit similar behavior across all three applications, 
although there are difference in terms of per-peer properties. The long term 
observation on the KAD peer-to-peer network from Steiner et al. also confirms the 
findings in [49]. 

As we have seen, the epidemic models are current the mainstream methods for 
virus or malicious network in cyberspace. We therefore present the basic concepts of 
epidemic modelling in this subsection. We refer interested readers to [51] and [52], 
and recent development could be find in some dedicated journals, such as, Elsevier’s 
Mathematical Biocience. 

Epidemic theory has a long history in study of biological infectious diseases. In 
the 1930s, Kermack and McKendrick published a series of papers titled “Contri- 
butions to the mathematical theory of epidemics”. These papers are often seen as 
the basis of further research in mathematical modeling of the spread of infectious 
diseases. Depending on different assumptions and scenarios, we usually have 
different epidemic models, such as the naive model, the susceptible-infectious 
model (SI in short), the susceptible-infectious-susceptible model (SIS in short), and 
the susceptible-infectious-recovery model (SIR in short). 

In general, there are three different states for each individual in epidemic 
modeling: susceptible (S state), infectious (J state), or recovered (R state). Any 
individual of a studied population stays in one of the states. The susceptible 
individuals are those who have not been infected, but could be infected; the infected 
individuals are those who have the capability of spreading a disease; and the 
recovered individuals are those who used to be infected by a disease, but they have 
been cured. 


2.4 Modelling Malicious Networks 25 
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In this model, we suppose that the total population is finite, and denoted as N; There 
is no curing process for the disease. The dynamics is described as follows. 


—- = BL(N—-h), (2.1) 


where J; is the infected hosts at time t, and ß is the pairwise rate of infection in 
epidemic theories. The solution of Eq. (2.1) is 


L = lo eB, (2.2) 


where Jp is the initial infected hosts. 
The discrete form of this model is as follows. 


I, =(1+@A)h_1— BAP, (2.3) 


where A is the unit of time, and (a= BN) is the infection rate, which presents the 
average number of vulnerable hosts that can be infected by one infected host per 
time unit. 


2.4.2 The SIS Model 


In SIS epidemic model, there is a curing process. An infected individual can be 
cured, but it does not develop immunity to the disease. As a result, it could be 
infected again. In terms of states, a cured individual stays in the susceptible state. It 
is also assumed that there is no vertical transmission of the disease (all individuals 
are born susceptible) and there are no disease-related deaths. We assume that the 
birth rate equals the death rate, so that the total population size is constant. Let B be 
the infection rate, and œ be the recovery rate. Therefore, the differential equations 
describing the dynamics of an SIS epidemic model are 


S = —BSI+al 
(2.4) 
a = BSI-al 


If we assume that the birth rate does not equal the death rate, then the size of the 
total population is variable. Let A be the birth rate, then we have the dynamics as 
follows. 
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u BY _ (a +All (2.5) 


2.4.3 The SIR Model 


In the SIR epidemic model, when individuals become infected, they develop 
immunity, and will not be infected in the future, and enter the immune state R. The 
SIR epidemic model has been applied to childhood diseases, such as chickenpox, 
measles, and mumps. The differential equations describing the dynamics of a SIR 
epidemic model are 


dS __ 

ds = —BSI 

L — BSI- al (2.6) 
dR 

ad = al 


If we assume that the birth rate does not equal the death rate, then the size of 
the total population is variable. Similarly, let A be the birth rate, the previous model 
becomes 


ds — BSI L 4 (7+R) 


dt BI (atA) 


N 
(2.7) 
a =al—AR 
N=S+4+I1+4+R 


Deterministic models are the first and popular tools, which are represented by 
differential equations of various forms. It is assumed that the size of susceptible and 
infectious population is a definite function of time in these models. These models 
can describe the dynamical inter-relations among the rates of change and population 
sizes. The mathematical theories for this type of models are well developed, and they 
are suitable for making predictions. 

Of course, these three models are only a small part of epidemic modelling. There 
are plenty of modelling methodologies, such as stochastic based modelling, random 
graph based modelling. Interested readers can search for related references. 
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Chapter 3 
DDoS Attack Detection 


Abstract In this chapter, we study on detection methods on DDoS attacks, which 
covers feature based detection methods, network traffic based detection methods, 
and detections against legitimate network event mimicking attacks. Each detection 
method is mathematically modelled for readers for possible further work in the 
fields. 


3.1 Introduction 


To defend against DDoS attacks, researchers have designed and implemented 
various countermeasures. In general, these countermeasures consist of three compo- 
nents: detection [1—6], defense (or mitigation) [3, 7-10], and IP trace back [11-13]. 
Among all the three categories, detection of DDoS attacks is obviously the first and 
the most important step in fighting against DDoS attacks. 

In general, DDoS detection methods include activity profiling [14, 15], packet 
filtering [16, 17], sequential change-point detection [1,2], wavelet analysis [18, 19], 
and so forth. All these methods are based on specific features or fingerprints of 
DDoS attacks. Unfortunately, it is very easy for hackers to simulate the features 
of legitimate network traffic to fool detection algorithms. For example, due to the 
open architecture of the Internet, hackers can spoof the source IP addresses of attack 
packets according to the real Internet IP address distribution to disable the source 
address distribution based detection algorithms [20,21]. Attackers can also change 
the TTL value of attack packets according to the real hop distance between bots and 
the victim in order to fool the hop-count detection methods [2,21]. In addition to 
these, attackers also mimic the behavior of flash crowds [1,22], which are sudden 
increases of legitimate traffic, to disguise their attacks. 

The majority of current DDoS detection methods are based on specific attack 
features [2,4—6], and therefore, are passive and incapable of detecting new attacks. 
The entropy of attack flows is an independent method from specific attack features. 
The entropy detector mentioned in the survey [22] came from [14], which has 
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the potential to raise the alarm for a crowd access, however, it is incapable of 
discriminating DDoS attacks from the surge of legitimate accesses (e.g., flash 
crowds). Lee and Xiang used relative entropy to measure the similarity of a known 
attack set and the suspected data set [23]. However, the relative entropy is not a 
perfect metric because of its asymmetrical property, and using relative entropy as a 
metric will introduce false positive or false negative. 

Researchers also used stochastic methods in the frequency domain and data 
mining techniques for DDoS detection [24, 25]. Cheng et al. [24] mapped DDoS 
attacks from the time domain to the frequency domain, and further transformed to 
the power spectral density to identify DDoS attacks. Lu et al. [25] adopted data 
mining technology to dig out the DDoS attack information. 

Moreover, information theory based methods were also powerful tools for DDoS 
detection. Sengar et al. [26] used information distance (Hellinger’s distance) to 
detect VoIP floods in peer-to-peer networks. We use the KL-distance for DDoS 
detection at network layer [27]. 


3.2 Feature Based Detection Methods 


A common methodology for anomaly detection is to identify normal patterns of the 
study objects, and an action out of the normal patterns is treated as an anomaly. 
This method has been widely applied in various security detection. We note that this 
strategy inherits false negative and false positive by its nature. 


3.2.1 Profile Based Detection 


A common strategy to disguise attack sources is IP spoofing. In order to fight against 
source IP spoofing, a hop-count filter is an effective method. Wang et al. [2] found 
that a hacker cannot falsify the number of hops an IP packet takes to reach its 
destination although he can forge any field in the IP header. Moreover, a receiver can 
infer the hop-count information based on the Time-to-Live field of the IP header. At 
the same time, it is easy for a Internet server to establish a table of IP address and 
their related hop-counts for its legitimate clients, which is called I[P-to-hop-count 
(IP2HC) mapping table. Based on this table, defenders can therefore discriminate 
spoofed IPs from legitimate IPs. 

The authors analyzed the detection rate in three cases: single source, multiple 
sources, and multiple sources with an awareness of the detection method. We rewrite 
the analysis by our understanding from the original paper. 

In the single source case, suppose the hacker spoofs the IP source address using 
an IP address of a legitimate client, C;, of the victim. Moreover, suppose C; usually 
submits n; legitimate packets to the victim for a given time interval. We assume the 
attacker pumps N; attack packets to the victim for the same time interval. Obviously, 
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Ni > ni. Among the N; attack packets, n; will be treated as legitimate ones. The 
detection rate is as follows. 


Rohe =1-0;, (3.1) 


where q; is called a fraction following the terms of the original paper. 

In the n(n > 1) multiple source case, we suppose the hacker uses n legitimate 
users’ IP addresses for spoofing, denoted as C),C2,...,Cn, each of them pumps 
N,,N2,...,Nn packets to the victim, and their related fractions are 0), 00,...,Qn, 
respectively. We obtain the detection rate as 


71 (1 — 04) Ni 


Zm = 
i= Ni 


(3.2) 


In the case that the spoofed packets are uniformly distributed among the n IP 
address, we have 


= ve (3.3) 


In the third case, if an attacker knows the existence of the detection method, 
but no further information. One anti-detection strategy he or she may take is to 
generate initial TTLs with a range of [Am,hn] using a given distribution, such as 
an uniform distribution or a Gaussian distribution. Suppose the probability of hop- 
count hg is p for the chosen distribution. Following the previous definition, we 
suppose the fraction of the legitimate IP address that has a hop-count of hg is Ox, 
then the detection rate in this case is 


n 
Zanti = 1— 5 OKPk- (3.4) 


k=m 


We note that in general if a hacker understands the victim better, he or she can 
obtain a lower detection rate in the defender’s viewpoint. 

Packet score [4] is another DDoS detection method at a potential victim end. 
Suppose we know the statistical distribution of legitimate packets, then based on 
Bayes inference, we can obtain the probability of its legitimacy of an incoming 
packet. The basic idea is as follows. Suppose each packet possesses a number of 
attributes, such as A,B,.... Let the value space for the attributes are {a1,a2,...}, 
{b,,b2,...}, and so on. They use N4, Np, and Np to represent the number of packets 
for attack packets, normal packets, and measured packets (subscript a, n, and m 
stand for attack, normal and measured, respectively). For a given time interval, it is 
straight that 


Nn = Na + Nn (3.5) 
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Let function C() be an accounter, and using the same meaning of subscripts a, n, 
and m, we have 


Nr = YG Aa) = È, CO Baby) =... (3.6) 
k=1 k=1 
Similarly, 
Na = X C(A = a) => CB = by) =... (3.7) 
k=1 k=1 
Nm = SC Aa) = X Cn(B = by) =... (3.8) 
k=1 k=1 


Following these, we obtain the probability distribution for each attribute of a 
packet in normal, attack and measured cases. 


Pr(A=ai) = “ a (3.9) 
Pr,(B = bi) = n (3.10) 


isg 


where i = 1,2,.... 

Likewise, we have Pra (A = ai), Pra(B = bi), ...,as well as Prm(A = ai), Prm(B = 
bi), .... The joint probability distribution among attributes for normal, attack and 
measured cases can be calculated by 


Cn (A = ai, B = bj,...) 


Pr, (A =a; B=b;,...) = ~ 3.11) 
n 
C(A =a;B=b;,... 
Pra(A = 0;,B =by,...) = n( ao i) (3.12) 
a 
GAR=e BHO 
Prn(A = a;,B=b;,...) = nh = jr) (3.13) 
m 


The authors defined the conditional legitimate probability (CLP)for a packet p as 
follows. 


CLP(p) = Pr{p = legitimate|p.A = ap, p.B = bp,...}, (3.14) 


where p.X denotes the attribute X of packet p. 
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With all these parameters in place, we can use the Bayes inference to calculate 
the packet score (the probability of legitimacy) of packet p as follows. 


Pr{ p = legitimate} N Pr{A = ap, B = by,...} 


3.15 
Pr{A =ap,B = bp,...} ( ) 


CLP(p) = 


In order to perform packet discarding, a threshold of CLP(p) is needed, the 
authors proposed to dynamically adjust the threshold based on the score distribution 
of recent incoming packets and the current level of system overload. 


3.2.2 Low Rate DDoS Attack Detection 


Low rate DDoS attack is also called shrew DDoS attack, which features with a 
low attack rate, and it is hard to detect it [28]. Due to the mechanism of low rate 
DDoS attacks, they inherent a specific characteristic: they submit attack packets 
periodically. Based on this feature, Chen and Hwang [1] proposed a spectral analysis 
method to detect this kind of low rate attacks. 

The theoretical tools are explained as follows. For a given sequence of network 


traffic, X, we can denote it as x[1],x[2],.... The autocorrelation of X is defined as 
1 N—m+1 
Rxx(m) = ss x x(n]x[n +m]. (3.16) 


Autocorrelation function is able to enforcing the periodicity if it exists in the 
original signal. However, it is still a time domain concept, and is not easy to identify 
the periodicity. A Discrete Fourier Transform (DFT) of the autocorrelation can 
clearly display the periodicity in the frequency domain. The definition of DFT is 


at N71 —j2nfn 
DFT [Rxx (m) N 2 Rxx (m mje N, (3.17) 


where f =0,1,...,N—1. 

The output of the DFT processing on autocorrelation is called power spectrum 
density (PSD). We can easily differentiate a shrew DDoS attack from normal 
network traffic as the PSD of shrew attack is concentrated at the low frequency 
band, while the PSD of normal traffic is much flatter. 
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3.3 Network Traffic Based Detection 


Network traffic is an important property of the Internet, therefore, it is also a 
powerful feature for DDoS detection at the network layer. Scherrer et al. investigated 
Internet traffic patterns thoroughly and proposed the Mean Quadratic Distances to 
measure traffic anomalies to discriminate DDoS traffic from flash crowds [29]. Lu, 
Wt et al. transformed the DDoS detection problem into a signal processing problem 
and then employed data mining technologies to extract DDoS attack information 
[25]. The wavelets method was applied to complete the task in [30]. Moreover, 
a wavelets technique developed in [31] was practiced in [32] to observe energy 
fingerprints to discriminate DDoS attack flows from legitimate flows. 

Due to the nature of the anarchy management fashion of the Internet, most of the 
network traffic based DDoS detections are performed at local area networks. In a 
local area network, system administrators can manage and configure the routers. As 
a result, the routers can cooperate with each other to detect possible attacks. 

The topology of a local area network can be treated as a graph, however, due to 
the aggregation feature of DDoS attacks, the attack paths in a local area network 
form a tree that rooted at the victim. We use Fig. 3.1 as a sample DDoS attack tree 
in a local area network for our following discussion. 

In Fig. 3.1, routers Re, Ro, Rg and R5 are at the edge of the local area network 
(we call them edge routers), and Ro is the router connected to the victim. 

It is necessary to know what is a flow before we progress to any analysis. 


Definition 3.3.1. Flow. At a given router in a local area network, all the passing 
packets that share the same destination address are categorized as one flow. 


i N /™ I is 
Hi 
Fig. 3.1 A DDoS attack tree 
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According to Definition 3.3.1, many flows may coexist at a router in a local area 
network. In the case of an ongoing DDoS attack, there exists one flow addressed 
to the victim. We call it attack flow. At the same time, there are many other flows 
that are addressed to different destinations. Different from the source IP addresses 
or TTL values of the attack packets, the attack flow cannot be spoofed or changed 
by hackers as the address of the victim is given. Therefore, flow based detection 
is independent of any specific attack features, and it can deal with new types of 
flooding attacks. 

Once we have flows in place, we need metrics to measure flows for anomaly 
detection. Entropy is a fundamental metric in information theory [33]. The entropy 
of a discrete random variable X is defined as 


H(X) = — © Pr{X =x} logPr{X = x}, (3.18) 


xEX 


where y is the sample space of X. The entropy of a random variable X measures the 
uncertainty of X in the unit of bits. 
For our detection purpose, we define flow entropy as follows. 


Definition 3.3.2. Flow Entropy. Following Eq. (3.18), the entropy of the flows at a 
given router is called flow entropy, which represents the randomness of the flows at 
the router. 


In general, the flow entropy of a router is stable in non-DDoS attack cases. 
However, when a DDoS flooding attack is ongoing, the attack flows will dominate 
the traffic on local area network routers, and consequently, the flow entropy drops 
dramatically in a very short time period, such as a few seconds. Therefore, we can 
raise a DDoS attack alarm when flow entropy decreases significantly in a short time 
interval. 

We make the following assumptions in order for the following discussion to be 
clearly understood. 


e All attack packets for a given attack session come from one botnet. In other 
words, they are generated by the same attack tools. It is possible that a victim is 
attacked by different botnets at the same time, however, we only discuss the one 
botnet case in this chapter. 

e The attack packets enter the local area network via a minimum of two edge 
routers and attack flows merge at the junction routers. 

e The whole network system is linear and stable when the DDoS attack is ongoing. 


The detection algorithm is running on all routers in the local area network. The 
routers, especially the edge routers, monitor the network traffic using flow entropy 
as the metric. In an attack free case, the flow entropy remains in a stable range. Once 
there is an attack, the flow entropy drops dramatically because there is either one or 
a number of flows dominating on the routers. As a result, our detection task is to 
find a suitable threshold, A, for the decrease of flow entropy. When the variation of 
flow entropy is equal or greater than A, it is a DDoS attack. We discuss this method 
as follows. 
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Suppose in an attack scenario, an attacker uses a random variable X to control 
the generation speed of attack packets (we call it attack rate or packet rate of attack 
flows). For example, using a constant speed to generate the packets, then 


Pr{X =C}=1, (3.19) 


where C is a constant. 
Increasing the number of packets according to attacking time f, then 


X=a-t+b, (3.20) 


where a,b are constants. 
Mimicking the network traffic pattern as the Poisson distribution, then 


Ake’ 
Prix =k} = — > (3.21) 
where k = 0,1,..., and À is constant. 
We use a random variable X to represent the packet rates of flows on a router 
within a given time interval. Let vector X = {x1,x2,...,Xn} denote the number of 


packets for n flows, respectively. We then have the probability distribution of the 
flows as 


-1 
p(xi) = xi: (a) (3.22) 


We use H; (X) to represent the flow entropy of flows on a given router. According 
to our previous definitions, we have 


n 


Hy(X) = — > p(xi) log p(xi). (3.23) 
i=l 


As discussed previously, Hy(X) is stable in general with minor fluctuations in 
normal network operations. However, in a DDoS attack scenario, the packet rate of 
the flow that targets the victim is significantly larger than the packet rates of other 
legitimate flows at the same router. Therefore, Hy(x) decreases dramatically. Let 
A(A > 0) be a given real number as the threshold. We can then use the following 
inequality to identify DDoS flooding attacks. 


Hy (X) l= —H;(X) |t=to-+Ar> A, (3.24) 


where f represents time, and At(At > 0) represents a short time interval. Our task is 
to find a suitable threshold A. 
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If H(X) is differentiable at fo, then inequality (3.24) can be further developed to 


H;(X) le=t+4t —H;(X) ley 


H,(X) = li 3.25 
i) APO At eee) 
Combining inequality (3.24) and Eq. (3.25), we obtain 

H,(X)<—A (3.26) 


3.4 Detection Against Mimicking Attacks 


It is critical for defenders to detect flash crowds mimicking DDoS attacks from 
genuine flash crowds [1, 22, 34]. If we fail to achieve this, then attackers can 
mimic the traffic features of flash crowds to disable our detectors; yet on the 
other hand, our detectors may treat legitimate flash crowds as DDoS attacks (also 
referred as false positive). The early research on discriminating DDoS attack flows 
from legitimate flows can be found in [35], however, delay and converging issues 
are not explored in this work. Jung et al. [34] tried to discriminate flash crowds 
from DoS attacks using three features: traffic patterns, client characteristics and 
file reference characteristics. Unfortunately, this counter attack method cannot 
follow the ever changing methods of attack. Moreover, the attacker will be able 
to disable the detector easily by mimicking the network traffic patterns of flash 
crowds. The entropy detector mentioned in the survey [22] came from [14]. Entropy 
detectors have the capacity to raise alarms of crowd access, however, it has difficulty 
discriminating DDoS attacks from the surge of legitimate accesses like flash crowds. 
Cheng and Hwang tried to separate flash crowds from DDoS flooding using the 
change-point detection method [1]. 

Human behavior has been employed to discriminate DDoS attacks from flash 
crowds. Xie and Yu [36,37] deploy user browsing dynamics to differentiate flash 
crowds and DDoS attacks. In general, the popularity of web pages for a given web 
site follows the Zipf distribution, however, the DDoS attack requests do not possess 
this property. Moreover, user browsing pattern is also used to achieve the goal, 
e.g. the number of requests from a user for a given time interval. A semi-Markov 
model based access matrix has been established to carry out the differentiation 
task. Oikonomou and Mirkovic tried to differentiate the two by modeling human 
behavior, e.g. request dynamics and request semantics [38]. 

Based on the available literature, we found the following facts concerning the 
current botnets. 


1. The attack tools are prebuilt programs, which are usually the same for one botnet. 
A botmaster issues a command to all bots in his botnet to start one attack session. 
2. The attack flows that we observe at the victim end are an aggregation of many 
original attack flows, and the aggregated attack flows share a similar standard 
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deviation as an original attack flow, and the flow standard deviation is usually 
smaller than that of genuine flash crowd flows. The reason for this phenomenon 
is that the number of live bots of a current botnet is far less than the number of 
concurrent legitimate users of a flash crowd. Rajab et al. [39] recently reported 
that the live bots of a botnet is at the hundreds or a few thousands level for a given 
time point. However, we observed that the number of concurrent users of the flash 
crowds of World Cup 98 is at the hundreds of thousands level. Therefore, in order 
to launch a flash crowd attack, a botmaster has to force his live bots to generate 
many more attack packets, e.g., web page requests, than that of a legitimate user. 


Based on this observation, we found that the similarity among the current DDoS 
attack flows is higher than that of a flash crowd. Therefore, we can take advantage 
of this feature to discriminate DDoS attacks from flash crowds. 


3.4.1 Similarity Metrics 


Similarity or distance measurement has been explored extensively for many 
years. Researchers have invented many metrics for similarity measurement, which 
includes first order and second order metrics. For example, mean and the Kullback- 
Leibler distance are first order metrics, while standard deviation and correntropy 
[40] are second order metrics. Of course, there are many different metrics for 
different purposes and applications, we discuss only a few of them here for our 
detection purpose. 

For two given flows or sequence P and Q, we denote their probability distribution 
as p(x) and q(x), respectively. In order to measure the distance or similarity between 
them, we need metrics. 

A frequently used distance metric is the Kullback-Leibler distance (KL distance 
in short), which is defined as 


=F pla) log? 
Pipa) = È pl lee ag (3.27) 


where y is the sample space of x. It is obvious that D(p,q) 4 D(q, p), if p £ q. As 
a result, the KL distance is not a metric in a rigorous sense, although it has been 
widely used in practice. 


Jeffrey distance fixes this asymmetric using combination of the Kullback-Leibler 
distance, which is defined as follows. 


Dy(p.4) = 5 (D(p.4) + D(a, p)]: (3.28) 


A third metric in this category is the Sibson distance, which is defined as 


Ds(p,q) = ; fp (>, z040) +D (a z0+a)) \ (3.29) 
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Different from previous KL distance based metrics, the Hellinger’s distance is 
defined as follows. 


Dy (p,q) = £ (ve®)- aal l (3.30) 


xEX 


All the metrics that we have seen can be categorized as first order metrics. Our 
previous study [41] indicates that the Sibson distance is the best metric among these 
first order metrics for DDoS attack detection purpose. 

In the following we discuss two second order metrics. The first one is correlation, 
which is a widely used in engineering. 

Let X; and X; (i 4 j) be two flows with the same length N, then the correlation 
between the two flows is defined as 


TX; X; = N X xifn]x;fn]. (3.31) 


The correlation is used to describe the similarity of different flows. However, in 
some cases, it may indicate zero correlation although the two flows are completely 
correlated but with a phase difference. Therefore, the definition is modified to be 
practical as follows. 


rx, x; |k njxj[n +k], (3.32) 


i ie 


where k(k = 0,1,2,...,N— 1) is the position shift of flow X;. 

However, there might still be a magnitude difference for the same similarity 
in different scenarios, therefore, unification is necessary. We define the correlation 
coefficient of the two flows as 


ea (3.33) 


Correlation coefficient has been used as a similarity metric for various network 
flow applications. For example, fast similarity search for video sequence on the web 
[42], distance learning on images [43], and similarity measurement among VoIP 
flows [26]. We noticed that abstract distance does not include time information, 
and it is sensitive to fluctuation of flows. However, the correlation coefficient is 
better than abstract distances in terms of stability. There are also some variants of 
correlation coefficient as metrics for similarity, such as order statistics correlation 
coefficient [44]. This method sorts the original items of sequence and, as a result, 
the timing information of the original signal will be lost, it is similar to the spectrum 
methodology. 
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community network with 
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The next second order metric we discuss is correntropy, which is a recently 
invented local tool for second-order similarity measurement in statistics. It works 
independently on measuring pair-wise arbitary samples. Correntropy features sym- 
metric, positive, and bounded under a clear theoretical foundation. 

For any two finite data sequences A and B, suppose we have sample 
{(A;,Bj)}7_1,m € N, then the similarity of the sequences are estimated as 


Vino (A,B) > ko(A (3.34) 


where ko (-) is the Gaussian kernel, which is usually defined as 
x2 


ca) (3.35) 


g(x) = exp(— 


Correntropy is widely used in various disciplines, such as face recognition [45]. 


3.4.2 Flow Correlation Based Discrimination 


Most of the detections are conducted at the potential victim end, which we usually 
also call it community network 

A sample community network with flows is shown in Fig.3.2. In the sample 
community network, R) and R3 are the edge routers, and the server is a potential 
victim that we try to protect. There are two incoming flows, X; and X;. They merge 
at router R; and both are addressed to the potential victim, and enter the community 
network via different paths. We sample the number of packets for a given network 
flow with a given time interval. Therefore, a network flow can also be represented 
by a data sequence X;{n], where i(i > 1) is the index of network flows, and n denotes 
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the nth element in the data sequence. For example, if the length of a given network 
flow X; is N, then the network flow can be presented as follows. 


Xiln] = {xi[1],xi[2],--- xi[N]}, (3.36) 


where x;[k](1 < k < N) represents the number of packets that we counted in the kth 
time interval for the network flow. According to our definition of flow, a router may 
have many network flows at any given point of time. 


Definition 3.4.1. Flow strength. For a network flow Xj, let the length of the network 
flow be N(N > 1). We define the expectation of the data sequence as the flow 
strength of X; as follows. 


Flow strength represents the average packet rate of a network flow. If X; is a 
DDoS attack flow, then we call E[X;] attack strength. 


1 N 
EIX] = 5 È xn (3.37) 


Definition 3.4.2. Fingerprint of flow. For a given network flow X; with length N, its 
fingerprint X; is the unified representation of X;, namely 


={ xf] _x[2] xil] \ (3.38) 


NEX] N-EX] N-E[X]] 


Following this definition, we know Sa x; [k] = 1. We can see that fingerprint of 
flow is essentially an instance of its probability density distribution. 

Based on Eqs. (3.37) and (3.38), we obtained the following relationship between 
a network flow and its fingerprint. 


/ 


X; =N-E|Xj]-X; (3.39) 


As previously discussed, the current botnets, such as SDbot, Rbot and Spybot, 
employ the same program to generate attack packets. Furthermore, they try to create 
as many attack packets as they can, usually with a very short delay (1 or 5 ms) 
between two attack packets. This confirms that flow fingerprint does exist in attack 
flows for a given botnet. 

We set up an overlay network on the routers in the community network that 
we have control over. We execute software on every router to count the number of 
packets for every flow and record this information for a short period of time at every 
router. Once an attack alarm goes off, the similarity among different suspected flows 
will be calculated, and a decision on whether it is a DDoS attack or flash crowd will 
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be made based on the global information of the community network. Under this 
framework, the requirement of storage space is very limited and an online decision 
can be achieved. 

A real community network may be much complex with more routers and servers 
than the sample network in Fig. 3.2. However, for a given server, we can always 
treat the related community network as a tree, which is rooted at the server. We 
must point out that the topology of the community network has no impact on our 
detection strategy, whether it is a graph or a tree, because our detection method 
based on flow rather than network topology. 

Once a surge on the server occurs, our task is to identify whether it is a genuine 
flash crowds or a DDoS attack. As we just discussed, when a possible DDoS attack 
alarm goes off, the routers in the community network start to sample the suspected 
flows by counting the number of packets for a given time interval, for example, 
100 ms. When the length of a flow, N, is sufficient, we start to calculate flow 
correlation coefficient among suspected flows. We discriminate DDoS attacks from 
flash crowds based on the aforementioned fact: in DDoS attacks, the suspicious 
network flows have a strong correlation although they are a mixture of a number 
of original attack flows with different delays. On the other hand, flow correlation 
coefficient among two flash crowds is weaker compared to that of two attack flows. 

Suppose we have sampled M network flows, X1, X2, ..., Xm, therefore, we can 
obtain the flow correlation coefficient of any two network flows, X;(1 < i < M) and 
X;(1 < j <M,iF j). Let Tx,,x; be the indication function for flow X; and Xj, and 
Ix,,x; has only two possible values: 1 for DDoS attacks and 0 otherwise. Let ô be 
the threshold for the discrimination, then we have 


1, px;.x,[k] = 6 
Ixix; = (3.40) 


0, otherwise, 


where 1 <i, j < M, andi £ j. 

In general, we may have more than two suspected flows in a community network. 
This means we can conduct a number of different pairwise comparisons, and the 
final decision can be derived from them in order to improve the reliability of our 
decision. 


3.4.3 System Analysis on the Discrimination Method 


In this section, we first present the difference between a flash crowd traffic and a 
DDoS attack traffic, and then we prove that the two can be discriminated using 
flow correlation coefficient. Following this foundation, we theoretically analyze the 
effectiveness of the proposed discrimination method, and prove that the threshold 6 
in Eq. (3.40) does exist. We then explore the relationship between flow correlation 
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coefficient and the length of flows. These pave the way for the implementation of 
the proposed method in practice. The attack flow converging issue is also addressed 
in this section. 

In order to make our analysis clear, we make the following assumptions. 


1. There is only one server in a community network which is under attack or 
experiencing flash crowds at any given time. 

2. The attack packets enter the community network via a minimum of two different 
edge routers. 

3. In one attack session, all the attack packets are generated by only one botnet, 
therefore the fingerprints of the attack flows are the same. 

4. The network delays are discrete and countable. 


First of all, we investigate the difference between a DDoS attack and a flash 
crowds in terms of traffic distribution. We suppose there is a flash crowd, and the 
statistics is known to everyone (including attackers), say the mean nu. Moreover, 
the botmaster has n alive bots to execute flash crowd attack. As we know from the 
previous research [46], the number of living bots is usually at hundreds or thousands 
level, however, the number of users to generate a flash crowds is quite big. For 
example, 360,000 browsers at the same time. As a result, botmaster must exhaust all 
living bots to generate attack traffic as many as they can, and in average the mean has 
to be as much as u for a bot. For this reason, the timer interval of the attack packets 
should be as small as possible in order to pump sufficient attacking packets. This 
results in the standard variation of packet arrivals very small, for example o = 0.01 u 
(o = 0 is the best from hackers viewpoint). All the attack traffic aggregated and 
measured at the victim’s location. Although the aggregated attack traffic obtains the 
mean of the flash crowd, nu, however, the its standard variation is bounded in a 
narrow space, e.g., 0.01. On the other hand, the flash crowd traffic is created by 
many users, it is a genuine distribution, the standard variation is much lager than 
that of the attack traffic. 

We make an example to explain this as shown in Fig. 3.3. The aggregated traffic 
came from 50 Gaussian distributions, with u = 10 and o =0.01u =0.1 for each bot; 
the single distribution is a Gaussian distribution with u =50ando’ =0.01 u =0.5. 
We can find the difference from the figure as we discussed. 

Following this observation, we will theoretically prove that the DDoS attack 
traffic and flash crowd traffic can be differentiated using flow correlation coefficient 
as a metric. 


Theorem 3.4.1. Let X; and X; (i # j) be two traffic flows that share the same 
distribution, and the standard variation is © is a random variable, the correlation 
coefficient of the two flows is inverse proportional to ©, namely, Px;,x; * 4. 


Proof. Let f; and f; be the mathematical functions to generate X; and Xj, respec- 
tively. Let random variable u be the mean of the distribution, and another random 
variable o represent the standard variation, and we know that the mean of o is 0. 
Without loss of generality, let 
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Fig. 3.3 The difference between an aggregated attack traffic and a flash crowd traffic 


fi =u+o 


fj =u-o 


As X; and X; possess the same mean, u, and there is no phase issue for the two 
flows. Therefore, we can simply use Eq. (3.31). Suppose we have N samples in X; 
and X;, respectively. We have 


Xi = {u +0,u2+0,...,un +o} 
Xj = {ui —0,u.—0,...,uy — O} 


Therefore, 
2 2) ~ 2 2 
PX;,X; =1X),X; = -Y (u =O) — 0*, 


where H is the mean of the random variable u. Once U is given, a larger standard 
variation o results a lower correlation coefficient among the flows, and vice vasa. 


Based on Theorem 3.4.1, it is true that we can differentiate DDoS attack flows 
from flash crowds as the standard variations between these two phenomenons are 
different. We have to bear in mind that Theorem 3.4.1 is proved in a circumstance 
of no background noise case. 
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We now investigate the flow correlation coefficient of any two independent 
network flows, such as flash crowd flows. Previous research has demonstrated 
that the web traffic follows the Pareto law [47,48], hence, the Pareto distribution 
represents the flow fingerprint of flash crowds. The definition of the distribution is 
as follows. 

Let X be a random variable, and xm be the minimum time interval for arrival 
packets. For a given arrival time interval x, the probability density function of the 
Pareto distribution is given by 


Pr{X =x} =a-x%- x (FN), (3.41) 


where Xm < x, and & is the Pareto index. 
Theorem 3.4.2. Given two identical independent flash crowd flow Xi and X; (i £ j) 


with same length N, we have lim py, x,|k] = 0. 
N> I 


Proof. As flash crowds, X; and X; follow Eq. (3.41). Let t be the index for both X; 
and X;. The probability of x;[t] = x;[t] = x is 


Pr {x;[t] =x;[¢] =x} = EA <1 (3.42) 


If X; = X;, namely x;[t] = x; [t] for all t(1 < t < N), then the probability is 
Pr{X; = X;} = [Pr { xilt] = xl] =x})" 
: i e is i } (3.43) 


(+1) 


Based on (3.42) and (3.43), we obtain 
Jim px,x; k] = lim Pr{X; = Xj} =0 


Theorem 3.4.2 shows that for any two independent flash crowd flows with length 
N, the flow correlation coefficient approaches 0 when N approaches infinity. 
We can easily obtain the following corollary by extending Theorem 3.4.2. 


Corollary 3.4.1. For two independent flash crowd flow X; and X; with same length 
N, Vd(6 < 1), IN’, when N > N', PX;.X; [k] <ô. 


We now move to explore the flow correlation coefficient among DDoS attack 
flows. Let us first find the expression of a DDoS attack flow, X;, which we obtained 
at an edge router. Suppose the observed attack flow is a mixture of attack flows 
that came from K different bots, and let x, represent the fingerprint of the attack 
flows. Based on the aforementioned discussion, the fingerprint of different attack 
flows in one attack session is the same, except that there are delays in different 
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attack flows. Let Xolj j| represent the fingerprint that is delayed j time units. As a 
result, the observed attack flow can be denoted as follows. 


Xi = ZÉ oN- E[X] Xol] 


(3.44) 
= ae ane 


where a;(1 < j< k <K ) represents the magnitude of the attack flows that possess 
the same delay j at the edge router. 


Theorem 3.4.3. Let Xj be the fingerprint of attack flows for one attack session. 
Under the condition of no network delay and no background noise, for two mixed 
attack flows X; and X; (i # j) that we observed at two edge routers, the correlation 
coefficient of X; and X; is 1, namely, px; x;|k] = 1. 


Proof. Under the delay free and noise free condition, based on Eq. (3.44), we can 
write X; and X; as follows. 


_ yk d 
Xi = pee Ap Xp 


k 1 

Xj = pee aq "Xo 
Let victor B; = (411,412, -- -Alk y, Bj = (a21,422, ae hy) and k = max(kj,k2), 
We try to make the length of B; and B; equivalent by filling zeroes to the shorter 


one. We then obtain Aj = (411, a@12,...a1%) and Aj = (az1,422,...a2,) from B; and 
B ;, respectively. Following these, we can rewrite X; and X; as follows. 


Xi = X Ai 


X;=X,-A;, 


where A; and A; are the mixture matrix for different merging of the attack flows, 
respectively. 

As we can always shift one flow to match the other to make them synchronized. 
Based on Eq. (3.32), we obtain 


N 
rx,,X;l = Pad njxj[n+ j] 


= 2 xh n) = (A; A) (o) (3.45) 
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Combining Eq. (3.45) with Eq. (3.33), we obtain 


rx,,x; |k] 


h [Sma E 


PX;,X; [k] = nl 1/2 


= mix) (A (3.46) 


p [ayaa (Roll) (4A) a] 


Because A, -Aj= A; - Ai, and substitute this in Eq. (3.46), we have 


FX; Xj [k] 


+ {(a,Ai)? (Xol) 


PX: X; [k] = i 172 


Theorem 3.4.3 demonstrates that in an ideal condition of delay and noise free 
environment, any two DDoS attack flows from one botnet are totally correlated 
because they are a combination of attack flows from different bots with different 
routes. 

In reality, however, delay and noise do exist and bots in a centralized botnet 
are coordinated by their botmaster. This means the delays among the attack flows 
from different bots depend on normal Internet delays, and therefore is limited when 
compared with fast Internet transportation facilities. As a result, the delay free 
condition can be satisfied to some degree. On the other hand, noise in attack flows 
are the legitimate packets that are also addressed to the victim at the same time 
as a DDoS attack is ongoing. However, the strength of the noise is much smaller 
compared with the strength of DDoS flooding attack flows. 

Following Theorem 3.4.3, we can further have the following corollary. 


Corollary 3.4.2. Let Y; and Y; be the noises for two DDoS attack flows X; and X; 


from one attack session, W6(6 < 1), 3A, px,,x;[k] > 6 holds when aa >A and 
E[x;] 

EF]? A. 

Proof. From Theorem 3.4.3, we know that py, x, [k] = 1 when £[Y;] = E[Y;] = 0; On 
the other hand, if the strength of noise is much stronger than that of signal, namely, 
E[Y;] >> E[X;] and E[Y;] >> E[X;], then 


Y; ~ Yi + Xi 


Yj ~ Yj+Xj 
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As a result, py,+y,.x;+¥; [k] ~ Py,.y; [k]. Based on Theorem 3.4.2, we know py, y,[k] > 
0 when the length N increases, namely, py,.x; [k] + 0 when N is sufficiently large. 


Corollary 3.4.2 indicates that the correlation coefficient of DDoS attack flows 
approaches | if the signal-noise-ratio (SNR), aa is sufficiently large. It is true 
that E[X;] > E[Y;] and E[X;] > E[Y;] for the DDoS flooding attack cases, therefore, 
the correlation coefficient of attack flows is close to 1 in an ongoing DDoS attack 
scenario. 


Theorem 3.4.4. DDoS attack flow can be discriminated from flash crowds by 
flow correlation coefficient at edge routers under two conditions: the length of 
the sampled flow is sufficiently large, and the DDoS flooding attack strength is 
sufficiently strong. 


Proof. Let X; and X; (i # j) be two random flash crowds, X, and X, (p # q) be two 
DDoS flooding attack flows, and let 6(6 < 1) be a given small real number. Based 
on Corollary 3.4.1, the following equation holds with condition on N. 


Pr {px xk] < 6|N} =1 (3.47) 


From Theorem 3.4.2, we are sure that the following equation holds as well with 
condition on N and signal-noise-rate (SNR). 


Pr {px,,x,[k] => 5|N, SNR} = 1 (3.48) 


We know that px, x, [k] is a decrease function on the length of flow, N; px,,x, |k] = 
1 when there is no noise and no delay, and it decreases when the strength of noise 
increases, therefore, there must exist a point where both Eq. (3.47) and (3.48) hold, 
and Theorem 3.4.3 holds as well. 


It is practical that we obtain a upper bounds, 6, of flow correlation coefficient 
for flash crowds for a given flow length. In a case that flow correlation coefficient is 
greater than 6, then they are DDoS attack flows. 

In a DDoS attack or flash crowds, we usually can have a number of suspected 
flows. Suppose we have maximum M suspected flows, from which we calculate the 
flow correlation coefficient Ix; X; for any two different flows X; and X ne! <ij< 
M,i # j). We can therefore have an integrated DDoS attack positive probability as 
follows. 


Lisi,jsmyi¢j XX; 
M $ 
(2) 
where J, is the indicator for DDoS attacks, and 74 = 1 represents positive for DDoS 
attacks. We make our final decision with global information as follows. 


Pr{I, = 1} = (3.49) 
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1 ; Pr{I, = 1} > 0.5 
I= (3.50) 
0, Pr{ly=1} <0.5 


In other words, it is a DDoS attack if at least half of the comparisons are positive; 


Otherwise, it is not DDoS attack. 
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Chapter 4 
Attack Source Traceback 


Abstract In this chapter, we investigate the attack source traceback in DDoS 
defence. We summarize the three major traceback methods to date: probabilistic 
packet marking, deterministic packet marking and network traffic based traceback 
methods. We formulate each traceback method, and present analysis for them, 
respectively. 


4.1 Introduction 


In cyber security, it is important to find out the attack sources, which is known as JP 
traceback or traceback. It is an open and challenging problem for the cyber security 
community. As most of DDoS attacks are carried out by botnets, we therefore refer 
attack sources to bots. It is ideal that we can identify the bots in an attack case, such 
as identifying the IP addresses of the attacking bots. However, it is extremely hard 
to achieve this at the moment. 

The current definition of IP traceback is identifying the closest routers or 
gateways to the real sources of attack packets. 

In order to make the following discussion going smoothly, we firstly clarify the 
definitions and terms that we use in IP traceback. 

As shown in Fig. 4.1, researchers usually treat a DDoS attack diagram as a tree 
T, which is rooted at the victim, V. The attack sources (bots) locate in LANs behind 
routers or gateways, we define those initial routers or gateways as the leaf nodes, 
which is denoted as a set L. From L to V, it forms an attack path P, including the 
intermediate routers. Based on the current definition of IP traceback, we need to 
identify the nodes on the attack tree as far as possible, ideally the nodes in set L. 

Due to the memoryless feature of the Internet and the easiness of source IP 
spoofing in attack packets, a victim cannot identify the source and attack path of 
an attack packet. As a result, all the traceback schemes need the participation of 
Internet routers. However, some routers may or may not participate in the traceback 
process. For example in Fig. 4.1, on the attack path R; — R2 — R3 — R4, router Rz and 
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attack path 
victim _-—_ 


© participated router ¿° non-participated router 


Fig. 4.1 A DDoS attack diagram in the internet environment 


R4 are non-participated routers, therefore, we can at most traceback to R3 although 
R; is the leaf node. On the other attack path R5 — Re — R7 — Rg, we can possibly 
trace to the leaf node Rg although there are some non-participated routers on the 
attack path. 

The current IP traceback methods can be categorized into two classes: packet 
marking mechanism and network traffic based mechanism. The early work on IP 
traceback deploy packet marking. In the IPv4 packet head, there are some unused 
bits, which are usually 17, 19 or 24 bits for different underlay protocols [1]. Network 
operators can embed special marks or IDs in these available space for traceback 
purpose. This packet marking mechanism is currently a dominant method for IP 
traceback, and it includes two categories: Probabilistic Packet Marking (PPM) [2] 
and Deterministic Packet Marking (DPM) [3]. In general, the packet marking based 
traceback schemes suffer a number of disadvantages, such as scalability, accuracy 
and depending on attack signatures. The traffic entropy variation based traceback 
can address some of the problems of packet marking [4]. 

All the available IP traceback schemes depend on the participation of Internet 
routers, and the more the better. Moreover, all IP traceback depends on successful 
DDoS detections. We will study the three categories of IP traceback in the following. 


4.2 Probabilistic Packet Marking Based Traceback 


The PPM strategy was firstly proposed in [2], and was further improved by 
researchers, such as in [5]. 
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The basic idea of the PPM scheme is that at participated domains, e.g., ISP 
networks, special marks are injected into the available packet space of incoming 
packets with a probability at all routers. At the victim end, we can establish an 
attack tree based on the received marked packets, and identify the attack sources 
based on the attack tree T. In order to establish a reliable attack tree, we have 
to accumulate a large number of marked packets, which causes a challenge on 
storage and computing power at the victim end. Moreover, the PPM scheme can 
only trace to the source nodes within its domain, which are usually far away from 
the attacking bots. 

Savage et al. [6] firstly introduced the probability-based packet marking method, 
node appending, which appends each nodes address to the end of the packet as it 
travels from the attack source to the victim. Obviously, it is infeasible when the 
path is long or there is insufficient unused space in the original packet. The authors 
proposed the node sampling algorithm, which records the router address to the 
packet with probability, p, on the routers of the attack path. Then, the probability of 
a packet marked by a router that d hops away from the victim is p(1 — p)¢~!. Based 
on the number of marked packets, we can reconstruct the attack path. However, 
it requires large number of packets to improve the accuracy of the attack path 
reconstruction. Therefore, an edge sampling algorithm was proposed to mark the 
start router address and end router address of an attack link and the distance between 
the two ends. The edge sampling algorithm fixed the problems of the node sampling 
algorithm to some extent. 

Based on the PPM mechanism, in [7], the traffic that targeted the victim was 
measured to construct the attack diagram, and then identified where the attackers 
were located. They focused on the traffic flows, which ended at the victim, and 
therefore, there was a tree which was rooted at the victim. For a router on the attack 
tree, the outgoing flow included two parts: the locally generated flows and the transit 
flows from the upstream router(s) of the attack tree. If X; and Xz are two flows on the 
attack tree, and Xj is the upstream flow of X2, then Pr{X, > x} > Pr{X2 > x}, for 
any x. The victim will collect all the marked packets from the routers and reconstruct 
the attack tree based on the traffic rates of different routers. This traceback method 
heavily depends on the queuing model, and it requires the traffic flows to obey 
specific patterns, e.g., the Poisson distribution. 

In [5], the randomize-and-link approach to implement IP traceback based on 
the probabilistic packet marking mechanism was proposed. The algorithm targets 
two aspects: to reconstruct the marks from the marker efficiently and to make the 
PPM more secure against hackers‘ pollution. The idea is to have every router X to 
fragment its unique message M, (e.g., IP address) into several pieces, Mọ, M1; ... Mı. 
At the same time, the router calculates the checksum C = C(M,), named as cord. 
The router assembles the mark as b;, and injects b; randomly into the unused IPv4 
packet header (say, N bits, which is 25 bits in the paper: 16 bits of fragmentation 
ID, 1 bit of the fragmentation index, and 8 bits of service type, all of them are 
used rarely in a common IPv4 packet). The b; includes three parts: an index of the 
pieces (In/ bits), a large checksum C = C(M,.)(N — In/ — |Mj|) bits, and a piece of 


58 4 Attack Source Traceback 


Mi, i =0,1,... (|Mj| bits). The cord is quite large, for example, 14 out of 25 bits, 
therefore, we can treat the cord as a random number, which is hard for hackers to 
predict. The victim can reconstruct the message efficiently by checking the cord and 
the index sequence. 

Yaar et al. [8] studied the marking technique to improve the PPM mechanism. 
They broke the 16-bits marking space into three parts: 1 bit for distance, 2 bits for 
fragmentation index, and a hash fragmentation of 13 bits. By this modification, the 
proposed FIT algorithm can traceback the attack paths with high probability after 
receiving only tens of packets. The FIT algorithm also performed well even in the 
presence of legacy routers and it is a scalable algorithm for thousands of attack 
sources. 

Snoeren et al. [9] proposed a method by logging packets or digests of packets 
at routers. The packets are digested using bloom filter at all the routers. Based on 
these logged information, the victim can traceback the leaves on an attack tree. The 
methods can even traceback a single packet. However, it also places a significant 
strain on the storage capability of intermediate routers. 

In [10], two hybrid schemes, Distributed Link-List Traceback (DLLT) and the 
Probabilistic Pipelined Packet Marking (PPPM), which combine the packet marking 
and packet logging method to traceback the attack sources are proposed. The 
first one preserves the marking information at intermediate routers in a specific 
way so that it can be collected using a link-list-based approach. The second 
algorithm targets propagating the IP addresses of the routers that were involved in 
marking certain packets by loading them into packets going to the same destination, 
therefore, preserving these addresses while avoiding the need for long-term storage 
at the intermediate routers. 

At the end of this subsection, we list the disadvantages of PPM based traceback 
schemes. 


1. First of all, the PPM strategy targets on traceback at the victim end, and it only 
operate in a local range of the Internet where the victim seats. Due to the anarchy 
nature of the Internet, it is very difficult to organize a large collaboration among 
different ISPs. Therefore, PPM is carried out generally within a small range, and 
we cannot traceback to the attack sources located out of the controlled domains. 

2. Secondly, as attackers can send spoofed marking information to the victim to 
mislead the victim. The accuracy of PPM is another problem because the marked 
messages by the routers who are closer to the leaves (which means far away from 
the victim) could be overwritten by the downstream routers on the attack tree. 

3. Thirdly, most of the PPM algorithms suffer from the storage space problem on 
storing a large amount of marked packets for reconstructing the attack tree. 

4. Fourthly, PPM requires a large number of Internet routers to participate in the 
traceback process. 
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Different from the PPM method, the DPM scheme deploys a deterministic method 
and tries to mark packets at routers that are the closest to attack sources (ideally, 
at the router of the LAN where bots stay). This scheme was firstly proposed by 
Belenky and Ansari [3], and then further developed in [1, 11]. A victim can identify 
an attack source with a few marked packets from the same source under the DPM 
scheme. The DPM schemes relax the pressure on storage and computing power at 
the victim side compared to that of the PPM schemes. 

Belenky and Ansari [3] noticed that the PPM mechanism can only solve large 
flooding attacks, and it is not applicable for attacks consisted of a small number 
of packets. Therefore, they proposed a deterministic packet marking method for IP 
traceback. The basic idea was that at the initial router of an information source, the 
router embedded its IP address into the packet by chopping the router’s IP into two 
segments with 17 bits each (16 bits for half of the IP address and 1 bit worked as 
index). As a result, the victim can trace which router the packets came from. 

In general, there are three possible units of an IPv4 packet: Fragment ID (16 
bits), Reserved Flag (1 bit), and Type of Service (TOS in short, 8 bits). The original 
DPM scheme used 17 bits for marking (Fragment ID and Reserved Flag), and the 
FDPM scheme used 24 bits (Fragment ID and TOS) as a maximum length and 16 
bits (Fragment ID) as the least length. We refer reader to [3, 11] and [1] for the 
reason why these space can be used for marking. 

The basic idea of encoding of the DPM mechanism is as follows. As show in 
Fig. 4.2, we suppose the available marking space is /(/ = 1,2,...). In the DPM 
encoding schemes, / is split into three parts: d (d for ID) bits are employed to denote 
an unique ID for an ingress router, and a (a for address) bits are used to carry a part 
of the IP address of the marking router, and s (s for sequence) bits are deployed to 
indicate the sequence or index of the partial IP address. It is obvious that 


l=a+d+s. (4.1) 


a bits d bits s bits 


Source address Hash 
fragment digest 


Available marking space ( / bits) 


Fig. 4.2 The encoding 
mechanism of DPM schemes 
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The constraints of Eq. (4.1) are 


au QS 


(4.2) 
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Usually, we have two important metrics to measure a DPM scheme: Maximum 
traceable sources Nmax and storage space Nstore- 

As scalability is an inherent hurdle of the DPM mechanism, therefore, it is always 
a hot topic of improving Nmax. Based on Eqs. (4.1) and (4.2), we have 


pa (4.3) 


The calculation of Nstore is usually modeled as a coupon collection problem, 
which is explained as follows. 

Suppose there are k (k € N) unique coupons to be collected. In order to collect 
all of the them, the total coupons that we have to collect is expressed as follows. 


1 1 1 
k “a 1). 4.4 
(G++ +3+1) (4.4) 


In our case, k = 2°, the storage cost for the DPM schemes is 


Nene =2 (Set gee tnt S41). (4.5) 

Jin and Yang [12] improved the ID coding of the deterministic packet marking 
scheme using redundant decomposition of the initial router IP address. For an IP 
address, they divided them into three redundant segments, 0-13, 9-22, and 18-31 
bits, and then five different hash functions were applied on the three segments to 
create five results. The resulting eight segments are recorded in the outgoing packets 
randomly. The victim could reassemble the source router IP using the packets it had 
received. 

Xiang et al. [1] noticed the scalability disadvantage of the original DPM 
scheme, and proposed a flexible deterministic packet marking (FDPM) method 
to traceback attack sources. They deployed a flexible mark length strategy to 
match different network environments, and the marking length varied from 16, 19 
to 24 bits depending on the underneath network protocols. Moreover, they also 
designed a flexible flow-based marking scheme to adaptively change the marking 
rate according to the workload of a participating router in the scheme. The FDPM 
significantly improved the maximum number of traceable sources. For example, for 
the FDPM-19 and FDPM-24 scheme, they can trace to 8,192 and 262,144 sources, 
respectively. While the original DPM scheme can only trace to 2,048 sources. 
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We also list the limitations of the DPM mechanism at the end of this 
subsection. 


1. First of all, scalability is the greatest challenge for current DPM schemes. As 
discussed before with only 25 spare bits available in an IPv4 packet, it is 
impossible to cover every possible source on the Internet. 

2. The DPM mechanism poses an extraordinary challenge on storage for packet 
logging for routers. 

3. Similar to the PPM schemes, DPM schemes are vulnerable to packet pollution 
from hackers. 


4.4 Marking on Demand Traceback Scheme 


The maximum number of traceable sources (Nmax) is a major metric for various 
existing DPM schemes. As described in [5], there are at least two million routers 
on the Internet, and the current DPM schemes cannot cover all the possible routers. 
Defenders can only trace 2,048 sources in the original DPM scheme [3]. To date, 
the best result in this aspect is 262,144 traceable sources from the Flexible DPM 
scheme [1]. This means we can only traceback around 10% of the total possible 
attack sources in terms of routers using the best available DPM scheme. 

The scalability problem of the current DPM schemes roots in its static encoding 
mechanism. All the current DPM schemes are designed under an implicit assump- 
tion: all Internet routers are possibly involved in a DDoS attack. Therefore, they 
have to assign an unique and static ID for each router of the Internet. However, 
the available space in IPv4 packet head is limited, and cannot serve the needs of 
encoding every Internet router an unique ID. 

In order to address the scalability problem of DPM mechanism, we proposed a 
Marking on Demand (MOD) scheme to dynamically assign marking IDs to DDoS 
attack related routers to perform traceback tasks [13]. 

The method based on two characteristics of DDoS attacks. 


1. In terms of space, most of the current DDoS attacks are organized by botnets 
[14-16], and for an attack session, the number of bots (compromised computers) 
involved is at the hundreds or a few thousands level [17]. This means that for 
every attack, there are only a small number of routers are involved, the IDs that 
the current DPM schemes assign to the routers that are not involved in attacks 
are wasted and not necessary. 

2. In terms of time, a DDoS attack session is usually short and the attack frequency 
of a botnet is low [18]. 


Based on these two facts, we only need to assign unique marks for every attack 
related router for a given attack session at a given time point. In other words, we 
can take advantage of different space and different time to significantly extend the 
scalability feature of the DPM mechanism. 
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As we know, DDoS attacks usually company with a surge of the number of 
packets addressed to victims. Due to detection sensitivity, we can detect DDoS 
attacks only when the increase of attack packets is sufficient. This phenomenon 
is generally easy to catch at the victim end, but hard to detect at the original LANs 
where bots seats. 

In the proposed traceback framework, we set up a global mark distribution server 
(MOD server). At every local router or gateway of participant Internet domains, 
we install a DDoS attack detector to monitor network flows. When there appears 
suspicious network flows, the detector requests unique IDs from the MOD server, 
and embeds the assigned unique IDs to mark the packets of suspicious flows. At the 
same time, the MOD server deposits the IP address of the request router and the 
assigned marks into its MOD database. Once a victim confirms a DDoS attack, it 
can extract the unique IDs from the attack packets and search the MOD database to 
identify the IP addresses of the attack sources. 


4.4.1 The Framework of Marking on Demand Scheme 


First of all, we make the following definition. 


Definition 4.4.1. Network Flow. We define the network packets that share the same 
destination address and the same source address as a Network Flow or Flow. 


Based on this definition, if there are n bots hosted by n different computers in 
a local area network, and they target on the same victim (the same destination IP 
address), then there are n attack flows in the system. On the other hand, in the case 
that one compromised computer hosts multiple different bots, and they pump traffic 
to m different victims, then we have m flows in the system. 

The system diagram of the Marking on Demand framework is shown in Fig. 4.3. 
In this new framework, we have one global MOD server, which assigns unique 
marks responding to requests. Moreover, the MOD server also possesses a web 
based database, which stores the mark information for possible retrieval. 

In detail, all the collaborating gateways (ideally all the routers of the Internet) 
install a DDoS attack detector, which monitors the outgoing network flows. The 
proposed traceback system works as follows. 


1. When there is a suspicious surge of volume of network flows, the monitor submits 
a request to the MOD server for a unique mark (step 1 in the diagram). 

2. The MOD server identifies a unique mark to serve the request, and deposits the 
related information (the mark, request source IP address, time stamp) into the 
database (step 2 in the diagram). 

3. The gateway will use the assigned mark to pad the suspicious outgoing traffic at 
the available marking fields. 
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MOD Server 


Victim 


Fig. 4.3 The framework of the marking on demand scheme 


4. Once a victim confirms that it is under a DDoS attack, it extracts the marks from 
the attack packets, and submit a query about the source IP of the related marks 
(step 3 in the diagram). 

5. The MOD server check its database about the marks, and responds the request 
with the related IP addresses. In this way, the victim knows the attack sources 
(step 4 in the diagram). 


The comparison of the MOD method and the traditional DPM schemes are as 
follows. As show in Fig.4.4, in the DPM and the FDPM schemes, the available 
marking space are split into three parts: d (d for ID) bits are employed to denote a 
unique ID for a ingress router, and a (a for address) bits are used to carry a part of 
the IP address, and s (s for sequence) bits are deployed to indicate the sequence of 
partial IP address. 


4.4.2 System Analysis of the MOD Scheme 


For the proposed MOD attack source traceback scheme, it is important to understand 
the attack frequency and attack durations under the whole Internet level. 

The available marking length in an IPv4 header is quite important for the 
performance of every DPM based scheme. Theoretically, in our MOD scheme, we 
can use 25 bits as the maximum and 16 bits as the least. For the sake of comparison, 
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Fig. 4.4 The comparison a bits d bits s’ bits 
between the proposed 
marking on demand scheme 
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we take the marking length as 24 bits as the maximum and 16 bits as the minimum 
for MOD scheme in our analysis and comparison. 

For a given time point t, let random variable A(t) be the number of ongoing 
DDoS attacks, D(t) be the duration for a given attack, and B(t) be the number of 
bots involving in an attack. For example, let A;(t) represent the ith attack of A(t), 
then its duration is denoted as D;(t) and the its number of bots is B;(t). Let N(t) be 
the total number of attack sources on the Internet for any given time point t. Then 
we have 


N(t) = > B;(t) (4.6) 


Let E[A(t)], E[B(t)], and E[D(t)] be the expectation of random variable A(t), 
B(t), and D(t) respectively. Based on the Wald Theorem, we have 


E|N(t)| = EJA (t)] -E[B@)| (4.7) 


As the marking space / is limited, therefore, we will run out of unique IDs for 
marking as DDoS attacks are continuously organized by hackers. Let T; (J stands for 
time interval) be the time interval that we can assign unique IDs, then the number 
of unique marks that we have for traceback needs during this time interval is 


(4.8) 
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Table 4.1 Quantity comparison among the MOD scheme with the DPM and the FDPM 


schemes 

Scheme Maximum traceable source Marked packet required Storage required 
DPM-17 2"! 32 129.87 
FDPM-16 2!0 32 129.87 
FDPM-24 2!8 32 129.87 
MOD-16  2!6 1 1 

MOD-17 2" 1 1 

MOD-24 2% 1 1 


Table 4.2 Key statistics on DDoS attack characteristics 


Attack Attack Attack Sources per 
Feature frequency [18] duration [18] rate [18] attack session [17] 
Value 6,272/h 5 min 500 pkts/s Around 1,000 


The quantity comparison among the MOD, the DPM, and the FDPM schemes is 
summarized in Table 4.1. 

From Table 4.1, we can see that in terms of maximum number of traceable 
sources, the MOD scheme is 64 (2°) times of that of the DPM or the FDPM schemes 
with the same available marking space. At the same time, we note that the storage 
cost for achieving the Nmax for the DPM scheme and the FDPM scheme are around 
132 times of that of the MOD scheme. More importantly, the MOD scheme can 
achieve single packet traceback. 

We summarize the key statistics of DDoS attacks in a global scenario from highly 
referred literature [17,18], and present them in Table 4.2. 

For a given time point t, we can calculate the average number of active bots using 
the following equation. 


EW) = ZAOLEBO! asi 
E[D(t)] 

Combining Eq. (4.9) and the parameters in Table 4.2, we can estimate the number 
of concurrent active bots in attacking around 523,000. We note that this number is 
the active bots from different botnets and from different network domains. 

Recall that our target is to identify the routers, which are the closest to bots. In 
other words, we are interested to know how many domains, N4 (t), are involved in 
an attack session at time ¢ given N(t). With Na(t) in hands, we can arrange our 
encoding for traceback. In order to calculate Na(t) from N(t), we need one more 
information on distribution of bots in terms of domain. To date, researchers are not 
clear about the distribution except that it is a non-uniform distribution [19]. At the 
same time, people found that the size distribution usually follows the power law, 
such as population in cities in a country or personal income in a nation [20, 21]. 
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Therefore, it is acceptable that we assume the size distribution of botnets follows 
the power law for the analysis. Further, we use the Zipf distribution as an instance 
of the power law, which is defined as follows. 


Pr{x=i}= £ 


a’ (4.10) 
where @ is a positive parameter, Pr{x = i} represents the probability of the ith 
(i= 1,2,...) largest botnet in terms of size, and $}; Pr{x =i} = 1. 

Suppose at a given time point r, the concurrent N(t) bots come from k LANs, 
namely we need to traceback to k routers in this case. If we sort the k LANs in terms 
of size in a decent style. Let the kth LAN only hosts one bot. Then the parameter k 
is decided by the following conditions. 


C — 
rE? 
(4.11) 
1=¢-N(t) 
In a simple form, k is decided by 
k 
N(t) =k*. > i" (4.12) 
i=l 


Taking N(t) as 523,000, we shown the relationship between œ and k (number 
of routers involved in a global level of traceback) in Fig. 4.5. We can see from this 
experiments that the routers involved in DDoS attacks within the whole Internet is 
around a couple of hundreds level for a given time point. This finding is important 
for us to relax our pressure of marking resource. At the same time, this finding 
supports the wide application of the MOD scheme for IP traceback. 

Based on Eq. (4.8), we know 7; depends on attack duration (D(t)), the number 
of different LANs involved in an attack (k) and the length of the available marking 
space (l). 


-D(t). (4.13) 


We are interested to know the time interval (77) of running out of marking space 
in various conditions. It is easy to see from Eq. (4.13) that E[T;] is proportional 
to the average attack duration E[D(t)]. A shorter E[D(t)] exhausts the marking 
space faster. Given a reasonable D(t), we want to estimate 7; on the variation of 
the number of LANs involved in attacks. Figure 4.6 presents a rough idea on 77. 
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Fig. 4.5 The relationship between parameter œ and the number of routers to be traced 
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Fig. 4.6 The relationship among 7; and the number of concurrent attack LANs (k) and the average 
attack duration (D) in the case of marking space / = 16 
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From Fig. 4.6, we can see that the 16 bits marking space can last more than 1.5 
days in a relatively strict condition: k = 150 and E[D] = 1 min. This means that we 
need to reuse the same ID around every 1.5 days. This may cause problems in terms 
of accuracy. We can assign unique marks for more than 10 days in a reasonable 
condition: k = 110 and E[D] = 5min. Usually, a DDoS attack does not last that 
long (10 days). We also note that for the / = 24 case, the Ty is around 256 (2°) 
times longer than that of / = 16 case. In this case, the accuracy issue is dramatically 
improved. 


4.5 Network Traffic Based IP Traceback 


In our previous work [4], we proposed a novel mechanism for IP traceback using 
information theoretical parameters, and there is no packet marking in the proposed 
strategy. Using this new strategy, we can avoid the inherited shortcomings of the 
packet marking mechanisms. 

The method is flow based and works at the network layer. The definition of flow is 
a bit different from the previous ones. In this context, the flow is defined as follows. 


Definition 4.5.1. Flow. For a given router, the packets, which come from the same 
upstream router and share the same destination IP address, are categorized as one 
flow. 


In the traceback strategy, we use flow entropy variation or entropy variation 
interchangeably. Once a DDoS attack has been identified, the victim initiates the 
following pushback process to identify the locations of zombies: the victim first 
identifies which of its upstream routers are in the attack tree based on the flow 
entropy variations it has accumulated, and then submits requests to the related 
immediate upstream routers. The upstream routers identify where the attack flows 
came from based on their local entropy variations that they have monitored. Once 
the immediate upstream routers have identified the attack flows, they will forward 
the requests to their immediate upstream routers, respectively, to identify the 
attacker sources further. This procedure is repeated in a parallel and distributed 
fashion until it reaches the attack source(s) or the discrimination limit between 
attack flows and legitimate flows is satisfied. 

The analysis, experiments, and simulations demonstrate that the entropy varia- 
tion based traceback mechanism is effective and efficient compared with the existing 
methods [7, 18]. In particular, it possesses the following advantages. 


1. The strategy is fundamentally different from the existing PPM or DPM traceback 
mechanisms, and it outperforms the available PPM and DPM methods. Because 
of this essential change, the strategy overcomes the inherited drawbacks of packet 
marking methods, such as limited scalability, huge demands on storage space, 
and vulnerability to packet pollution. 
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2. The implementation of the method brings no modifications on current routing 
software. Both PPM and DPM require update on the existing routing software, 
which is extremely hard to achieve on the Internet. On the other hand, our method 
can work independently as an additional module on routers for monitoring 
and recording flow information, and communicating with its upstream and 
downstream routers when the pushback procedure is carried out. 

3. The method will be effective for future packet flooding DDoS attacks because 
it is independent of traffic patterns. Some previous works [7] depend heavily 
on traffic patterns to conduct their traceback. For example, they expected 
that traffic patterns obey Poisson distribution or Normal distribution. However, 
traffic patterns have no impact on the scheme; therefore, we can deal with any 
complicated attack patterns, even legitimate traffic pattern mimicking attacks. 

4. The method can archive real-time traceback to attackers. Once the short-term 
flow information is in place at routers, and the victim notices that it is under 
attack, it will start the traceback procedure. The workload of traceback is 
distributed, and the overall traceback time mainly depends on the network delays 
between the victim and the attackers. 


4.5.1 System Model for IP Traceback on Entropy Variations 


In order to clearly describe our traceback mechanism, we use Fig. 4.7 as a sample 
network with DDoS attacks to demonstrate our traceback strategy. Ina DDoS attack 
scenario, as shown in Fig.4.7, the flows with destination as the victim include 
legitimate flows, such as f3, and a combination of attack flows and legitimate flows, 
such as fı and f2. Compared with non-attack cases, the volumes of some flows 
increase significantly in a very short period of time in DDoS attack cases. Observers 
at routers R1, R4, R5, and V will notice the dramatic changes. However, the routers 
who are not on the attack paths, such as Rọ and R3, will not be able to sense the 
variations. Therefore, once the victim realizes an ongoing attack, it can pushback 
to the LANs, which caused the changes based on the information of flow entropy 
variations, and therefore, we can identify the locations of attackers. 

The traceback can be done in a parallel and distributed fashion in the scheme. 
In Fig. 4.7, based on its knowledge of entropy variations, the victim knows that 
attackers are somewhere behind router R;, and no attackers are behind router R2. 
Then the traceback request is delivered to router R,. Similar to the victim, router 
R, knows that there are two groups of attackers, one group is behind the link to 
LANp and another group is behind the link to LAN. Then the traceback requests 
are further delivered to the edge routers of LANop and LAN, respectively. Based on 
entropy variation information of router R3, the edge router of LANo can infer that the 
attackers are located in the local area network, LANo. Similarly, the edge router of 
LAN, finds that there are attackers in LAN; furthermore, there are attackers behind 
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Fig. 4.7 A sample network with DDoS attacks at the internet level 


router R4. The traceback request is then further passed to the upstream routers, until 
we locate the attackers in LANs. 

Entropy is an information theoretic concept, which is a measure of randomness. 
We employ entropy variation to measure changes of randomness of flows at a router 
for a given time interval. We note that entropy variation is only one of the possible 
metrics. Chen and Hwang used a statistical feature, change point of flows, to identify 
the abnormality of DDoS attacks [22]. However, attackers could cheat the feature 
by increasing attack strength slowly. We can also employ other statistic metrics 
to measure the randomness, such as standard variation or high-order moments of 
flows. We choose entropy variation rather than others because of the low computing 
workload for entropy variations. 
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Fig. 4.8 Traffic flows at a router on an attack path 
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4.5.2 System Analysis on the Model 


Let us have a close investigation on the flows of a router, as shown in Fig. 4.8 
Generally, a router knows its local topology, e.g., its upstream routers, the local 
area network attached to the router, and the downstream routers. 

We name the router that we are investigating as a local router. We denote a flow 
on a local router by < u;,d;,t >,i, j E€ I,t E€ R, where u; is an upstream router of a 
local router R;, d; is the destination address of a group of packets that are passing 
through the local router R;, and t is the current time stamp. For example, the local 
router R; in Fig. 4.8 has two different incoming flows — the ones from the upstream 
routers R; and Rg, respectively. We name this kind of flows as transit flows. Another 
type of incoming flows of the local router R; is generated at the local area network, 
we call these local flows, and use L to represent the local flows. We name all the 
incoming flows as input flows, and all the flows leaving router R; are named as output 
flows. We denote u;(i € J) as the immediate upstream routers of the local router R;, 
and set U as the set of incoming flows of router R;. Therefore, U = {u;,i € 1} +{L}. 
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We use a set D = {dj,i € I} to represent the destinations of the packets that are 
passing through the local router Rj. If v is the victim router, then v € D. Therefore, 
a flow at a local router can be defined as 


fij (ui, dj) = {< uj,dj,t > Jui E€ U,dj E€D,i,JE I}. (4.14) 


We denote |f;;(u;,dj,t)| as the count number of packets of the flow fj; at time t. 
For a given time interval AT, we define the variation of the number of packets for a 
given flow as follows. 


Nij(uj,dj,t + AT) am |fij(ui,dj,t + AT)| = |fi;(ui,dj,t)|. (4.15) 


If we set | f;j(ui,d;,t)| = 0, then N;;(ui,dj,t + AT) is the number of packets of 
flow fij, which went through the local router during the time interval AT. In order 
to make the presentation tidy, we use Nj;(uj,d;) to replace Nj;(ui,d;,t-+ AT) in the 
rest of this section. 

Based on the large number theorem, we have the probability of each flow at a 
local router as 


Nij(ui,d;) 


—— ; (4.16) 
Ani Xizi N;j(ui,dj) 


Pij(ui,dj) = 


where pj;j(ui,d;) gives the probability of the flow fj; over all the flows on the local 
router, and X; X= Pij = 1. 

Let F be a random variable of the number of flows during the time interval AT 
on a local router, therefore, we define the entropy of flows for the local router as 
follows. 


H(F) = — Y pij(ui,d;) log pij(ui,d)). (4.17) 


ij 

In order to differentiate from the original definition of entropy, we call H(F) 
flow entropy, which measures the variations of randomness of flows on a given local 
router. 

For a local router, suppose that the number of flows is N, and the probability 
distribution is { p1, p2,- . -, pn}. We can simplify the expression of entropy of (4.17) 
as follows. 


N 
H(F) =H(p1,p2,.--,pw) = — > pilog pi. (4.18) 
i=1 


Based on the characteristics of the entropy function, we obtain the upper bound 
and lower bound of H (F) as follows. 


0 <H(F) < logN. (4.19) 
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We reach the lower bound when p; = 1,1 < i < N, pk =0,k = 1,2,...,N, and 
k Æ i. We have the upper bound when pı = p2 =... = pn. 

Based on our definition of the random variable of flows, we have the following 
special cases to reach the lower bound and the upper bound, respectively. When 
there is only one flow alive during the sampling time interval, and there are no 
packets going through the local router for the other flows, H(F) = 0. When the 
number of packets for each flow is the same among all the flows at a local router, 
then we have H(F’) = logN. 

We divide our time line into two segments for the following investigation: before 
DDoS attack and under DDoS attack. The local router’s flow entropy is, therefore, 
denoted by H~ (F) and H* (F), respectively. Let 6 be a reasonable threshold, and 
C be the mean of H~ (F), and the standard variation of H~ (F) be 6. We know that 
H- (F) is quite stable for a long time period. We justify our threshold 6 to make the 
following equation holds with high probability, 


|H (F)-C|< ô. (4.20) 


In order to make the mean C and standard variation ô adaptive to the network 
traffic variations, let 


Ci] = Xi- Clt- i], Xi- = 1, 
(4.21) 
6[t] = Xi Biôlt—- il, X Bi=1, 


where C|t] represents the current mean, C[t — i] is the mean of the ith sample instance 
in the near past, and a;(i = 1,2,...,n) are the weights for the n past samples, 
respectively. In order to reflect the nearest changes, let a; > a; fori < j,i, j € J. The 
values of œ; are fixed and could be decided by the experiments of non-attack cases. 
The same for 6[t], 5[t — i], and B;, respectively. The evolutions will be suspended 
when a DDoS attack is ongoing. 

If an attack flow is going through a local router, then the following equation holds 
with high probability, 


|H (F)-C| >ô. (4.22) 


Moreover, we know that the reason behind this is that the packet numbers of flows 
< ui,v > (u; € U) increase significantly. In order to find the immediate sources of 
the attack flows from the upstream routers, we sort the flows < u;,v > (u; € U) in 
terms of number of packets of a given attack flow, Nj,(u;,v). We calculate the flow 
entropy reiteratively by taking the suspicious flows out starting with the flow that 
has the greatest packet number, until the difference between the flow entropy of 
the remaining flows and the mean is less than or equal to the threshold, 6. In other 
words, the process stops when the following equation holds, 


|H* (F \ max({< uiv >})— C| < ô, (4.23) 
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where F \ max({< uj,v >}) means taking the maximum element of set {< u;,v >} 
from set F. Then the subset {u;} C U, which includes the upstream routers that we 
have taken out before Eq. (4.23) holds, is the set of suspicious immediate sources of 
the DDoS attack. Then the traceback requests are further forwarded to the elements 
of set {u;}, respectively. The traceback processing terminates under the following 
conditions. 


L=max({< uj,v >}), 
(4.24) 
I+ (F\L-O)| < ô 


Then L, the flows of the local area network, is the attack source of that branch on 
the attack tree. 
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Chapter 5 
DDoS Attack and Defence in Cloud 


Abstract In this chapter, we explore DDoS attack and defence in the incoming 
dominant cloud computing platform. We firstly answer the question whether we can 
beat DDoS attacks in cloud with its current attack capability or not, and the cost 
for countering the attacks. We also discuss a possible architecture of cloud firewall 
against DDoS attacks. 


5.1 Introduction 


Today, cloud computing has become one of the fastest growing sectors in the IT 
industry all over the world. Cloud computing features a cost-efficient, “pay-as-you- 
go” business model and flexible architectures, such as SaaS, PaaS and SaaS [1]. 
A cloud platform can dynamically clone virtual machines in a very quick fashion, 
e.g. duplicating a gigabyte level server within 1 min [2]. Despite the promising 
business model and hype surrounding cloud computing, security is the major 
concern for businesses shifting their applications to clouds [3, 4]. 

After so many years, defenders have realized that it is essentially a resource 
competition problem in DDoS attack and defence, and the winner is the one who 
possesses more resources than his opponent. For a non-cloud environment, the 
current DDoS mitigation techniques depend on either sufficiency of resource or 
collaboration among different organizations. On top of a large number of attack 
packets, a DDoS attack may be carried out in various forms. As a result, our 
detection has to go through many different possible detection methods, such as IP 
spoofing [5], hop-count [6], packet score [7], and flash crowd mimicking [8, 9]. 
Therefore, DDoS detection is expensive in terms of computing power. 

Yau et al. [10] viewed DDoS attacks as a resource management issue in their 
DDoS defence proposal. They proposed the installation of a router throttle at 
selected upstream routers of a possible victim. The participant routers regulate the 
traffic flows to the protected server in a proactive way using a level-k max-min 
fairness strategy. Their target was to constrain the number of attack packets far 
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away from the protected server. If we have sufficient resources, such as bandwidth 
and computing power, we can then perform a deep packet inspection, and filter 
out attack packets. However, this is not usually possible for a non-cloud platform. 
Chen et al. [11] proposed a DDoS attack mitigation scheme, attack diagnosis (AD), 
using a divide-and-conquer strategy to address the problem. They used the push 
back strategy to enable an AD to work as close as possible to an individual attack 
source. As a result, attack sources were isolated, and then throttled. Another method 
is to establish an ally among multiple network domains to protect a potential 
victim. In [12], distributed change-point detection architecture was proposed using 
a change aggregation tree (CAT). Each CAT works at one network domain, and all 
CATs report their traffic fluctuation to a server, with the server overlooking all the 
reports to make a final decision on DDoS attacks. The authors of [13] proposed 
FireCol, a distributed intrusion prevention system at the ISP level to mitigate DDoS 
attacks from large botnets. The cooperative ISPs establish virtual protection rings 
around potential victims to defend and collaborate through the exchange of selected 
traffic information. However, it is difficult to obtain either sufficient resources or 
collaboration among multiple network domains in a non-cloud environment at the 
Internet level. 

For us, we are interested to answer the following question: How can we defeat 
DDoS attacks in the cloud environment. A cloud infrastructure provider pools a 
large amount of resources and makes them easy access in order to handle a rapid 
increase in service demands [1]. Therefore, it is almost impossible for a DDoS attack 
to shut down a cloud. However, individual cloud customers (referred to as parties 
hosting their services in a cloud) cannot escape from DDoS attacks nowadays as 
they usually do not have the advantage. 

A variation of a DDoS attack in cloud computing is the Economic Denial of 
Sustainability (EDoS) attack [14] or the Fraudulent Resource Consumption (FRC) 
attack [15]. If the billing mechanism for cloud customers is “pay-as-you-use”’, 
botnet owners can create a large number of fake users to intensively consume 
the service of the targeted cloud customer. For example, the existing flash crowd 
mimicking attacks [8, 16] on an e-business web site is an excellent example. As a 
result, the bill for the targeted cloud customer will increase dramatically until the 
victim suspends her service or is bankrupted. On the other hand, if a cloud customer 
fixes her cost for renting the resources of her hosted services, then an effective DDoS 
attack will disturb, or even shut her services down. 

There has been some work on mitigating DDoS attacks in a cloud computing 
environment. Lua and Yow [17] proposed the establishment of a large swarm 
network to mitigate DDoS attacks on a cloud, with an intelligent fast-flux technique 
used to transparently maintain connectivity between nodes in a swarm network, 
cloud clients and servers. Their software simulation indicated they can maintain a 
high percentage of benign request delivery rates while successfully blocking attack 
packets. Chen et al. [18] proposed on-demand security architecture to offer different 
services for different needs in cloud environments. This includes three factors: risk 
of network access, service type and security level. Based on the mechanism of 
cloud computing, this is a good idea as it meets the different requirements of users. 
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Table 5.1 Amazon EC2 pricing for standard on-demand instances 


Instance Type Linux (per hour) Windows (per hour) 
Small (default) $0.060 $0.115 
Medium $0.120 $0.230 
Large $0.240 $0.460 
Extra large $0.480 $0.920 


In order to deal with EDoS attacks, Sqalli et al. [14] proposed a white and black 
list based filtering scheme to block malicious service requests. Amazon developed 
cloudWatch [19], a tool to monitor the company’s cloud resources and mitigate 
EDoS attacks on their cloud customers. 

Different from other computing platforms, a cloud data centre usually possesses 
a significant amount of computing power and bandwidth. For example, Amazon 
EC2 has almost 500,000 servers, and small instances on a server usually share 
1G bits bandwidth. In terms of financial cost, it is far cheaper to establish a web 
based application within a cloud environment compared with the traditional way. 
In a cloud, an instance is a basic unit for renting, and it is equivalent to a PC ora 
server. We list the latest pricing of Amazon EC2 Pricing for Standard On-Demand 
Instances [20] in Table 5.1. 

Moreover, cloud platforms possess the unique feature of cloning virtual machines 
on the fly. Of course, there is a cost for performing this function. In general, there 
are two categories for cloning virtual machines in a cloud: the network-driven 
approach and the non-network efforts. For the first group, researchers usually take 
a BitTorrent-like strategy to treat an image of a virtual machine as one file, and 
distribute the entire file as demanded [21]. In the second category, researchers try 
to take advantage of non-network techniques, such as reducing the size of a virtual 
machine image, prediction and partial page launch to speed up the initialization of 
virtual machine instances [22]. Peng et al. [2] observed six production cloud data 
centers for a long period of time, and proposed a chunk-level, network topology- 
aware virtual machine image distribution network. The proposed method can reduce 
the cloning time with the 1 min level. 

As we have discussed in previous chapters, due to the anti-virus and anti- 
malware effort and software, the number of active bots a botmaster can manipulate 
is constrained to the hundreds or few thousands level, even though the number of bot 
footprints may be much larger. Therefore, the attacking resource for botnet owners is 
limited. As a result, it is possible for defenders to win the battle in cloud environment 
taking advantage of the unique features of cloud platforms. 
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5.2 Defeat DDoS Attacks in Cloud 


In our previous work [23], we proposed a practical dynamic resource allocation 
mechanism to confront DDoS attacks that target individual cloud customers. In 
general, there is one or several access points between a cloud data center and the 
Internet. Similar to firewalls, we place our Intrusion Prevention System (IPS) at 
these locations to monitor incoming packets. When a cloud hosted server is under a 
DDoS attack, the proposed mechanism will automatically and dynamically allocate 
extra resources from the available cloud resource pool, and new virtual machines 
will be cloned based on the image file of the original IPS using the existing clone 
technology [21, 22]. All IPSs will work together to filter attack packets out, and 
guarantee the quality of service (QoS) for benign users at the same time. When the 
volume of DDoS attack packets decreases, our mitigation system will automatically 
reduce the number of its IPSs, and release the extra resources back to the available 
cloud resource pool. 

As aforementioned, the essential issue to defeat a DDoS attack is to allocate 
sufficient resources to mitigate attacks no mater how efficient our detection and 
filtering algorithms are. In order to estimate our resource demands and QoS 
for benign users in a DDoS battle, we employ queueing theory to undertake 
performance evaluation due to its extensive deployment in could performance 
analysis, such as in [24]. 

First of all, we examine the features of a cloud hosted virtual server in a non- 
attack scenario. As shown in Fig.5.la, similar to an independent Internet based 
service, a cloud hosted service includes a server, an intrusion prevention system 
(IPS in the diagram), and a buffer for incoming packets (queue Q in the diagram). 
The IPS is used to protect the specific server of the hosted service. All packets of 
benign users go through the queue, pass the IPS and are served by the server. In 
general, the number of benign users is stable, and we suppose the virtual IPS and 
virtual server have been allocated sufficient resources, and therefore the quality of 
service (QoS) is satisfactory to users. 

When a DDoS attack occurs against the hosted virtual server, a large number of 
attack packets are generated by botnets, and pumped to queue Q. In order to identify 
these attack packets and guarantee the QoS of benign users, we have to invest more 
resources to clone multiple IPSs to carry out the task. We proposed to clone multiple 
parallel IPSs to achieve the goal as shown in Fig. 5. 1b. 

The number of IPSs we need to achieve our goal depends on the volume of the 
attack packets. As discussed previously, the attack capability of a botnet is usually 
limited, and the required amount of resources to beat the attack is usually not very 
large. In general, it is reasonable to expect a cloud can manage its reserved or idle 
resources to meet demand. 
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Fig. 5.1 (a) A cloud hosted server in a non-attack scenario. (b) A cloud hosted server under DDoS 
attack with the mitigation strategy in place 


5.2.1 System Model in General 


In general, we treat our studied system as a black box, and observe its input and 
output with respect of time t. We denote the input as a(t), the output as b(t), and the 
system function of the black box as A(t). We then have a relationship among these 
three functions as follows. 


b(t) = a(t) h(t), (5.1) 


where * is the convolution operation. 

In order to obtain solutions for the output, and for most of the cases, we map a(t) 
and h(t) into another domain using different transform techniques, such as Laplace- 
transform, Z-transform, and so on. We use the Laplace transform here. The Laplace 
transform of a(t) is defined as follows. 


A(s) ê J a(t)e dt (5.2) 
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Similarly, we can obtain H (s) from h(t). Let B(s) be the Laplace transform of b(t), 
and we obtain B(s) through the following equation. 


B(s) =A(s)-H(s) (5.3) 


Once B(s) is in place, we can calculate b(t) using the inverse Laplace transform, 


b(t) = = / B(s)e ds (5.4) 


In our case, a(t) represents the arrival distribution, A(t) is the system service 
distribution. In the queueing theory, our studied system can be modeled as G/G/m, 
namely, general arrival distribution and general service rate distribution. However, 
for this general model, the analysis will be very complex, and we may not have 
computationally attractable methods to calculate the numerical results of these 
models [25]. For example, we cannot obtain A(s), H(s) from a(t), h(t) most of 
the time, and we cannot obtain b(t) even if B(s) is in place sometimes. As a result, 
researchers have to approximate the complex G/G/m model to solvable models in 
order to proceed with analysis and prediction. To date, only the M/M/m model 
(exponential arrival rate and service rate) can offer a closed form result as these 
distributions possess wonderful properties, such as additive and memoryless [26]. 
We will also follow this mainstream method for our analysis on the proposed 
mitigation strategy. 


5.2.2 Approximation of the Model 


As widely applied in cloud performance analysis [24], we make a few reasonable 
assumptions and approximations in order to make our modelling and analysis 
feasible and practical. There are: 


e Whether or not there is a DDoS attack, we suppose the number of benign users 
is stable, and we suppose the cloud is big enough and has sufficient reserved or 
idle resources to overcome a DDoS attack on a cloud customer. 

e We suppose the arrival rate to the system follows the Poisson distribution when 
a DDoS attack is ongoing. We know the arrivals of a server in a non-attack case 
obey the Poisson distribution. When a DDoS attack is ongoing, there are many 
more packets to the system, and a general conclusion from queueing theory is that 
a large number of arrival rate can be approximated as a Poisson distribution [26]. 
Therefore, we use the Poisson distribution as the arrival distribution for both 
attack and non-attack cases. 

e We suppose the service rate of each individual IPS follows an exponential 
distribution, which is common in queueing analysis. 
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In order to measure the performance of the system, we use average time in system 
of packets as a metric of QoS. We denote T, (n stands for normal) as the acceptable 
average time in system for packets of benign users in non-attack cases. In general, T, 
is a constant. In attack cases, the average time in system varies because the number 
of attack packets changes. Therefore, we denote it as T4 (t) for a given time point t 
(a stands for attack). 

We note that T, and T,(t) do not include the time spent in the normal service 
of the server because this time is the same for both attack and non-attack cases. In 
other words, the system we study here only includes queue Q and the original IPS 
or multiple IPSs. 

In order to guarantee the QoS of benign users in attack cases, we need to 
dynamically allocate resources into the battle, and make sure T,(t) < T, for any 
time point t. 

We use a function R(-) to represent the resource investment. Let variable x 
be the expected system performance, such as average time in system of requests. 
Obviously, R(-) depends on x and time point t. We therefore denote it as R(x,t). We 
also simplify it as R(x) or R(t) if it is clear in the context. 

As shown in Fig.5.1b, we model our mitigation system as an M/M/m queue, 
namely, one incoming queue with an infinite buffer size, the arrivals following 
the Poisson distribution, and m(m > 2) multiple servers each with an exponential 
service rate. 

With the system model in hand, we can transform our mitigation problem into an 
optimization problem: minimizing the resource investment R(t) while guaranteeing 
the QoS for benign users in attack cases. We formulate the problem as follows. 


mini.R(t) 
s.t. (5.5) 
Talt) < Th. 


5.2.3 Resource Investment Analysis 


In order to decide on the investment for a expected quality of service, we have to 
define an executable investment function R(x) with respect to a system performance 
expectation x. Variable x could be a vector to represent specific requirements of 
different resources, such as x =< CPU, memory, IO, bandwidth >. 

For feasibility reasons, we define R(x) as a linear and non-decreasing function. 
Let x, y be two different system performance expectations. Then we have the 
following properties of this investment function. 
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R(x) = 0, x=0 (a) 
R(x) < RY), O<x<y (b) (5.6) 


R(ax+ by) = aR(x)+bR(y), a,b ER. (c) 


In practice, the current CSPs, such as Amazon EC2, offer resources in terms of 
instance. An instance includes a fixed amount of various resources, e.g. memory and 
IO. In other words, an instance is the basic unit for resource allocation. In this case, 
Eq. (5.6) does reflect this practice very well. 


5.2.4 System Analysis for Non-attack Cases 


For a web based service in non-attack cases, it is generally accepted that the arrival 
rate of queue Q follows the Poisson distribution, whose probability density function 
is defined as 

Ake 


PIX =k} = 


„k=0,1,.... (5.7) 


For non-attack cases as shown in Fig. 5.1a, the system can be naturally modeled 
as an M/M/I/co queue. We denote the packet arrival rate as À, and the service rate 
of the IPS as u. 

People usually derive a parameter called utility rate or busy rate as the ratio of 
the arrival rate and the service rate. In this case, we denote it as 


a 
pee, (5.8) 
aT 


Usually, we need to make sure pẹ, < 1 in order to keep the system in a stable state. 
Based on queueing theory [26], we know the probability of the system stays state 
Ty (namely, there are k packets in the system) is 


mo =1- x 
N (5.9) 
Nk = (4) To. 


The probability density of the time in system is 


P{T =t} =(u—A)e H), (5.10) 
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for t > 0. The average time spent in the IPS system is 


1 1 
T,= = . 5.11 
n-1 (E-W — 


Naturally, we assume T, meets users’ expectations of service. We will use 7, as a 
benchmark of QoS for benign users when the cloud hosted server is under a DDoS 
attack. 


5.2.5 System Analysis for Attack Cases 


In the case of a cloud customer being subjected to a DDoS attack as shown in 
Fig. 5.1b, and based on our proposal, the cloud will clone multiple IPSs to counter 
the attack in order to guarantee the QoS for benign users. 

It is natural that we model the mitigation system using the M/M/m model: 
Poisson arrival rate and multiple (m) servers with an exponential service rate. 

For the sake of neatness in the analysis, we make the following definition. 


Definition 5.2.1. Attack strength is the total number of arrivals to a victim for a 
given time interval when a DDoS attack is ongoing. 


From this definition, we know an attack strength includes both benign packets 
and attack packets. For the sake of simplicity, we represent an attack strength as 
r(r > 1) (where r is a real number) times of the arrival rate of non-attack cases. As 
we denote the arrival rate of non-attack cases as A, an attack strength is therefore 
denoted as rÀ. The service rate for each IPS is still u as it was in the non-attack 
case, and all IPSs share the workload. Once again, based on queueing theory [26], 
we have the following system service rate uy (k servers in service). 


ku k<m 
Uk = min|ku,mu] = (5.12) 
mu m< k. 


We obtain the m(0 < k < ©) (the probability of k packets in the system) as 
follows. 


™ = 7 (5.13) 
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where p is the system busy rate, which is defined in a multiple homogeneous server 
case as 


ra 
p= a (5.14) 
Similarly, we have to make sure p < | in order to keep the system in a stable 
state. 
In Eq. (5.13), mo represents the probability of a state of the system that there are 
no packets in the queue, including the initial state of the system. 7 is an important 
parameter in queueing analysis, and it is defined as follows in the M/M/m model. 


m—1 k k 
_ (mp)* | (mp)* 1 
To = a a po = (5.15) 


Opposite to state 7%, we have 7,,,, which is the probability that a packet has to 
wait when it arrives in the system. 7,4 is expressed as 


Nm+ = by Tk 


k=m+1 
_ __ _(mp)" 
m k 
= 1- $ m EE., (5.17) 
k=0 i 


From a system viewpoint, we are interested in the average time spent in the 
system, T4 (t). Here, the number of servers, m, is also a factor on Ta (t). We therefore 
express it more explicitly as T,,(t,m), which is given as follows. 


T,(t,m) = E[T,(t,m)] 


_ ul (mp)” To 
= z (m +e 2 Te) (5.18) 


Combining Eqs. (5.14) and (5.18), we have 


rÀ \m 
1 1) To 
Ta = i . 1 


As previously discussed, in order to guarantee the QoS for benign users during a 
DDoS attack, the condition of Eq. (5.5) has to be satisfied. Therefore, 


1 1%" w 1 


+ < (5.20) 
! ra = 
u rà m (1— ak A 
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For simplicity, let 


_A (rA)"! mo 
ion, (u—A) mi (1 Ee (5.21) 


where 7p is determined by Eq. (5.15). 
Combining (5.21) and (5.20), we have the constrain for the optimization as 


f(r,m) > 0. (5.22) 
Moreover, we note that Eq. (5.22) is under the following constrains 
À 
ry <m (a) 
ES (b) (5.23) 
m =2,3,..., (c) 
where condition (a) comes from Eq. (5.14). 
If Eq. (5.22) does not hold, then it is time to invest more resources to clone one 
or more IPSs against the ongoing attack. 
Usually, a cloud has sufficient idle or reserved resources, which can be used to 


counter brute force DDoS attacks. We denote the resource for one IPS as Ryps, and 
the available reserved resources of a cloud as R,. The maximum IPSs that we can 


use is then | | . In a strict sense, on top of the constrains in Eq. (5.23), we have 


to have one more constrain as follows. 


Re 
m< | | +1. (5.24) 
Rips 


5.3 A Cloud Firewall Framework Against DDoS Attacks 


As we just discussed in the previous section, clouds need firewall or something 
similar to protect themselves. However, there are few work have been done in the 
category from literature. 

There are plenty of work have been done in terms of traditional firewall. One 
topic is the efficiency of firewall with the aim to ensure that the firewall does 
not become a bottleneck for a given system. Rovniagin and Wool [27] modeled 
the firewall packet matching problem as a mathematical point location problem, 
and proposed a Geometric Efficient Matching (GEM) algorithm. Their experiments 
indicated that the proposed algorithm is more space efficient for rule based firewalls. 
Hu et al. [28] considered the quality of policy of firewall configuration, and 
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Cloud 
Platform 


The cloud firewall 


Fig. 5.2 The framework of the dynamical firewall for cloud platforms 


presented a firewall policy anomaly management framework, which employed a 
rule based segmentation technique to identify policy anomalies and derive effective 
anomaly resolutions. Salah et al. [29] proposed a embedded Markov chain based 
mathematical model for rule-based firewall performance analysis, especially against 
DDoS attacks. They presented closed form expressions of a number of important 
performance metrics, such as mean throughput, service time, CPU utilization. As 
majority of the similar work, they assumed the arrival rate following the Poisson 
distribution and the service times are independent and exponentially distributed. 

In our previous work [30], we proposed a framework of cloud firewall. As shown 
in Fig. 5.2, the proposed cloud firewall is between the Internet and a cloud platform. 
All incoming requests will be examined by detectors in a sequence until a detector 
reports positive. Further actions will be taken, e.g. dropping or blocking related 
requests. 

In general, assume we have N different detectors D;(i = 1,2,...,N), each of 
them aims at one specific anomaly, such as viruses, different DDoS attacks, and 
information phishing. The packet arrival rate is A, and the throughput rate is 
represented by y. 

It is generally accepted that request arrival follows the Poisson distribution. 

For each detector D;(i = 1,2,...,N), we denote its service rate as ue. We can 
easily obtain the average service time of the N detectors as 


0 1 Z 0 
BE ag A (5.25) 
= 


In the case that we have no knowledge of the anomaly distribution, we suppose 
the N detectors share the same probability of positive detection, p (p > 0), then we 
know the positive detection follows the geometric distribution. Let random variable 
X be the number of trials, then the probability that we obtain the first positive 
detection at k(k € N) is expressed as follows. 


Pr{X =k} = py =(1- p) 'p. (5.26) 
The probability cumulative distribution function is 


F(X <k] =1-(1- p). (5.27) 
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The mean of the geometric distribution is 
E[x|=-. (5.28) 


From a system point of view, we care about the average of system service time, 
which is 


N k 
u=> (n >, w) (5.29) 


Based on the Wild theorem, we can rewrite Eq. (5.29) based on Eqs. (5.25) 
and (5.28). 


u = E[X]E[u’]. (5.30) 


In order to find closed form solutions for our studied objects, we further suppose 
the service time of N detectors is i.i.d, and follows exponential distribution. This 
approximation is practical and commonly used in performance evaluation. Let Ue 
be the mean of the service time of the detectors, then Eq. (5.30) can be further 
expressed as 


1 
U= —he. (5.31) 
p 


As a result, we can model the cloud firewall as a M/M/1 queue with arrival rate 
A and service rate u. Taking advantage of the conclusions of queueing theory [26], 
we can extract the key performance metrics that we are interested. 

First of all, the probability of the system stays state g, (namely, there are k 
requests in the system) is 


Po = 1— 2 ae Ph 
(5.32) 
k k 
= (4 — (PA) 1—2 
re = (4) po = (2) 1 Ue * 
The probability density of the time in system is 
Pr{T =t}=(u—A)e #4”, (5.33) 
fort > 0. 
The average time spent in the firewall system is 
T = — (5.34) 


He — på ` 
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Based on Eq. (5.34), we obtain the average system throughput 
1 1 

= = —he— À. (5.35) 
T p 

The average number of requests in the system 


K 
K= > kpr (5.36) 


In terms of firewall, we are interested in CPU utilization, which is also referred 
as to carried load. We denote this metric as Ucpu, and it can be calculated as 


1 
Ucpu = yu = me (5.37) 


Let p be the busy rate of the studied system. From a system viewpoint, in order 
to make the system stable, the following has to be met. 


p= <1. (5.38) 


5.3.1 Dynamic Resource Allocation for Cloud Firewall 


In this subsection, we focus on how to economically and dynamically allocate 
resource to meet the requirement of a cloud firewall. 

Following the resource investment function Eq. (5.6), in the cloud firewall case, 
our system requirement x is the average time in system of requests, T, which is 
defined in Eq. (5.34). In order to avoid our cloud firewall becoming a bottleneck, the 
time in system for a request has to be limited. Let AT be an acceptable threshold for 
T, then our resource investment problem becomes an optimization issue as follows. 


minimizeR(T) 
st. (5.39) 
T <AT. 


5.3.2 Single Chain vs Multiple Parallel Chains 


Suppose a single detection chain with a given resource works well for an arrival rate 
A. As the number of requests to a cloud platform is dynamic. Let m € R and m > 1 
be the traffic strength. Intuitively, when the arrival rate increase to mA, we should 
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Fig. 5.3 The two options for a 
resource investment for cloud 
firewall 
A 
mA A 
—> — —>— 
Ne 
b 
mÀ ys 
——> —_ 


invest more resource to handle the traffic. One problem rises naturally, when the 
number of requests increases, how should we invest our resource? 

In general, there are two options to deal with this case: (1) clone [m— 1] parallel 
detection chains based on the original detection chain; (2) keep the original detection 
chain, but increase the service capability of each detector to mu. We show these two 
options in Fig. 5.3. We are interested about which one is a better investment. 

In order to conduct the comparison, we use service rate as a study object in R(). 
When the arrival rate increase to mA. The resource that we need for the multiple 
detection chain strategy is R(m). Let x be the expected service rate of the single 
detection chain strategy, then, 


1 1 
= AT = ——_. A 
x—maA u—À 240) 
It is easy to find that 
x=U+(m—1)A. (5.41) 


As À < u, combining this with Eq. (5.41), we have 
x< mi. (5.42) 


Based on the property (b) of investment function (5.6), we have 
R(x) < R(mp). (5.43) 


Namely, for a given performance metric,the resource needed by the single detection 
chain is less than that by the multiple detection chain strategy. 

Furthermore, we expect to know for the same resource investment, how much 
gain we can obtain from a single detection chain strategy compared with the multiple 
detection chain strategy. We denote the system throughput for the two strategies as 
Ym (multiple detection chains) and y% (single detection chain), respectively, and their 
average time in system for a request as T, and T,, respectively. 
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As shown in Fig. 5.3a, the average time in system for the multiple detection chain 


is equivalent to one of its parallel detection chain. Based on Eq. (5.34), we have 


1 


Tn = —- 
uA 


(5.44) 


At the same time, based on Fig. 5.3b, we obtain the average time in system for a 


single detection chain as 


T= E Li (5.45) 
mu—mA m 


Therefore, we obtain 
Ys = MYm. (5.46) 


As we know m > 1, therefore, for the same resource investment, the single 


detection chain strategy outperforms the multiple detection chain strategy m times. 
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Chapter 6 
Future Work 


Abstract We summarize the book and discuss the possible future work. 


In this book brief, we have gone through the research work on DDoS attack and 
defence to date. We have discussed the short history of DoS and DDoS attacks, the 
reasons why it is hard to handle or eliminate such attacks, DDoS attack detection, 
attack source traceback, and DDoS attack and defence in cloud environment. 

Denial of service attack is an open problem today, we believe it will be a critical 
threat in cyberspace for a long time. Usually, information security is classified into 
three categories: confidentiality, integrity, and availability. We can see clearly that 
DDoS attack falls in the availability category. Obviously, Denial of service is a 
big topic in information security. Due to its nature, DDoS attack and defence is 
an endless battle between attackers and defenders. Once defenders design a new 
defence method or eliminate a vulnerability, attackers will invent new strategies or 
methods to circumvent them to achieve their malicious goals, and vice visa. 

Information and communication technology sector is a fast developing part of 
the whole world. New hardware, new computing models, and new platforms are 
continuously invented, developed and used in human society. We are witnessing the 
emerging of many new ideas and applications in the cyberspace, such as Cloud 
Computing [1], Big Data [2], Energy-harvesting Networks [3], and so on. It is 
hard for designers to predict the possible vulnerabilities of their new products. As 
a result, defenders are generally passive in the battle against attackers. In addition, 
these new products will introduce new security and privacy problems and may threat 
to some existing solutions. For example, the cheap and available super computing 
power from cloud causes a great threat to the time complexity based encryption 
mechanism. 

We have to note that DDoS related research is only a very small part of cyber 
security. In order to address cyber security problems, we have to obtain a better and 
deeper understanding of the cyberspace. As indicated by the American Research 
Council, the research on network science just recently started [4]. Moreover, as the 
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Web and the Internet are getting more and more huge and complex, we face many 
challenges to understand these giant networks. 

One essential problem that we facing is that we do not have a feasible funda- 
mental theory for networks. It is commonly accepted by the networking community 
that we lack theories for networks. Although there are many different theoretical 
tools that serve specific problems well, however, they are usually effective in a 
narrow range, rather than a wide circumstance. The main tools for networking are 
Graph Theory and Queueing Theory. There have been some developments in the 
two categories for computer science, such as the random graph model [5] in Graph 
Theory, and network calculus [6,7] and stochastic network calculus [8] in Queueing 
Theory. However, all the developments are far from the final goal. This is similar 
to the blind men and the elephant issue, and we need a global view of the problem. 
We have seen some effort in narrowing the gap between theories and applications 
in the domain of computer science, such as the effort at the end of last century [9], 
and a recent work [10]. However, we do not see a ground-breaking progress in this 
direction. 

If we quickly cast our eyes to the history of science, we find that the necessary 
mathematical tools were always in place far before people actually used them for 
ground-breaking work. For example, Differential Geometry was quite mature when 
Albert Einstein used it for his work. Today, mathematics has far advanced in each 
of its branches, e.g., the recent development in number theory [11]. As a result, we 
should be confident that the mathematical tools are already available for us to model 
the Web, the Internet, and other complex networks. Our job is to deeply understand 
these mathematical tools and fit them into our computer science problems. If 
necessary, we may invent new mathematical tools to solve our problems. 

In terms of information technology, human beings had the first radio tube in 
1904, and the first television in 1924. However, it was 1949 when Claude Shannon 
discovered Information Theory, which is a fundamental theory for information 
technology. We know the APARNET was implemented in the late 1960s, and 
became a public network as the Internet in the 1990s. So far, we do not have a 
commonly accepted theory for the giant network. However, it is not difficult to 
believe that the expected theory will be in place sooner or later. 

It is time to dig harder, folks. 
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